-
Eesti ilukirjanduse korpus Corpus of Estonian fiction
Eesti ilukirjanduse korpus alates 1990. Kokku 5,6 miljonit sõna. More info at http://www.cl.ut.ee/korpused/segakorpus/eesti_ilukirjandus_1990 A text corpus containing Estonian... -
VESPA
The aim of the VESPA learner corpus project is to build a large collection of disciplinary writing by L2 English university students across registers, disciplines and degrees of... -
Eesti puudepanga korpus Estonian Treebank
Estonian Treebank is available both in the VISL and TigerXML format. Esttre consists of ca 1400 manually annotated sentences (10600 tokens), the text classes represented in the... -
Segakorpus: Riigikogu Corpus of the Proceedings of Estonian Parliament
Riigikogu korpus. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/segakorpus/riigikogu/index.php?lang=et Corpus of the Proceedings of Estonian... -
Morphological analyzer for Estonian ESTMORF
ESTMORF is a computer program for analysing unrestricted Estonian text. ESTMORF is implemented in a most straightforward way: it compares word forms of the running text with... -
Eesti murdekorpus Estonian Dialect Corpus
korpus More info at http://www.murre.ut.ee/estonian-dialect-corpus/ The dialect corpus consists of: 1) Dialect recordings. The corpus is based on dialect recordings which... -
Eesti ajakirjanduse korpus Corpus of Estonian newspaper texts
Korpus sisaldab eesti ajalehti, 182 miljonit sõna. TEI P5 XML märgendus, UTF8 kodeering. More info at http://www.cl.ut.ee/korpused/ Corpus of Estonian newspaper texts, 182... -
Eesti emotsionaalse kõne korpus Estonian Emotional Speech Corpus
Korpus sisaldab 1234 eestikeelset viha-, rõõmu- ja kurbuse emotsiooniga lauset ning neutraalset lauset. Naishääl, 44.1 KHz, 16Bit, Mono; wav, textgrid:... -
Opinio
Twitter data corpus from, on the one hand, French-speaking Belgian political accounts and, on the other hand, a sample of accounts from the French-speaking Belgian population.... -
google22
gggggggggggggggg -
the Morphologically Annotated Part of BulTreeBank
This distribution represents only the morphological information encoded in BulTreeBank - HPSG-based Treebank of Bulgarian. It contains about 214.000 tokens. It was used for the...