CLARIN - Repositories

Eesti ilukirjanduse korpus Corpus of Estonian fiction

Eesti ilukirjanduse korpus alates 1990. Kokku 5,6 miljonit sõna. More info at http://www.cl.ut.ee/korpused/segakorpus/eesti_ilukirjandus_1990 A text corpus containing Estonian...

VESPA

The aim of the VESPA learner corpus project is to build a large collection of disciplinary writing by L2 English university students across registers, disciplines and degrees of...

Eesti puudepanga korpus Estonian Treebank

Estonian Treebank is available both in the VISL and TigerXML format. Esttre consists of ca 1400 manually annotated sentences (10600 tokens), the text classes represented in the...

Segakorpus: Riigikogu Corpus of the Proceedings of Estonian Parliament

Riigikogu korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/riigikogu/index.php?lang=et Corpus of the Proceedings of Estonian...

Morphological analyzer for Estonian ESTMORF

ESTMORF is a computer program for analysing unrestricted Estonian text. ESTMORF is implemented in a most straightforward way: it compares word forms of the running text with...

Eesti murdekorpus Estonian Dialect Corpus

korpus More info at http://www.murre.ut.ee/estonian-dialect-corpus/ The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which...

Eesti ajakirjanduse korpus Corpus of Estonian newspaper texts

Korpus sisaldab eesti ajalehti, 182 miljonit sõna. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/ Corpus of Estonian newspaper texts, 182...

Eesti emotsionaalse kõne korpus Estonian Emotional Speech Corpus

Korpus sisaldab 1234 eestikeelset viha-, rõõmu- ja kurbuse emotsiooniga lauset ning neutraalset lauset. Naishääl, 44.1 KHz, 16Bit, Mono; wav, textgrid:...

Opinio

Twitter data corpus from, on the one hand, French-speaking Belgian political accounts and, on the other hand, a sample of accounts from the French-speaking Belgian population....

google22

gggggggggggggggg

the Morphologically Annotated Part of BulTreeBank

This distribution represents only the morphological information encoded in BulTreeBank - HPSG-based Treebank of Bulgarian. It contains about 214.000 tokens. It was used for the...

4,731 datasets found