-
German Twitter Titling Corpus
The German Titling Twitter Corpus consists of 1904 stance-annotated tweets collected in June/July 2018 mentioning 24 German politicians with a doctoral degree. The Addendum... -
WikiWarsDE Corpus
The WikiWarsDE corpus is a German corpus containing Wikipedia articles with annotations of temporal expressions. Its creation was motivated by the English WikiWars corpus (Mazur... -
Polish Spatial Texts (PST) 2.0
The extended version of Polish Spatial Text corpus. Texts derived from polish travel blogs manually annotated with spatial expressions. A spatial expression is a text fragment... -
Polish Corpus of Wrocław University of Technology 1.3 Korpus Języka Polskieg...
KPWr (Polish Corpus of Wrocław University of Technology, pol. Korpus Języka Polskiego Politechniki Wrocławskiej) is a corpus of written and spoken documents available on the... -
PELCRA EMO corpus
The corpus comprises 30 focused structured interviews (17 hours and ca. 200000 word tokens) centred on the topic of emotions. The corpus has bibliographic, morphosyntactic and... -
DiaBiz.Kom sample 1.0
DiaBiz.Kom sample is a sample of DiaBiz.Kom corpus, which is a dialog corpus comprising transcriptions of phone-based customer-agent interactions in several key business domains... -
Polish Spatial Texts (PST) 1.0
Texts derived from polish travel blogs manually annotated with spatial expressions, A spatial expression is a text fragment which describes a relative location of two or more... -
KPWr annotation guidelines - phrase lemmatization
Annotation guidelines for manual phrase lemmatisation in KPWr (Polish Corpus of Wrocław University of Technology). -
KPWr chunks 2021
357 documents from KPWr corpus annotated manually at syntactic level (chunks). Please cite as: Oleksy, M., Walentynowicz, W., & Wieczorek, J. (2021). New approach to the... -
KPWr annotation guidelines - keywords (1.0)
Annotation guidelines (first version) for keywords in KPWr (Polish Corpus of Wrocław University of Technology (https://clarin-pl.eu/dspace/handle/11321/270). -
The Adventure of the Speckled Band 1.0 (manually tagged)
"The Adventure of the Speckled Band" (pol. "Sherlock Holmes i Pstrokata Opaska") by Arthur Conan Doyle - modern Polish translation manually tagged with morphological... -
HamleDT 3.0
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that... -
OpenLegalData (2022 - Corpus)
OpenLegalData is a free and open platform that makes legal documents and information available to the public. The aim of this platform is to improve the transparency of... -
Etalon 1.0
Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech... -
STYX 1.0
STYX 1.0 is a corpus of Czech sentences selected from the Prague Dependency treebank. The criterion for including sentences into STYX was their suitability for practicing Czech... -
Szeged Corpus 1.0
written, monolingual, general, manually POS annotated reference corpus; 1,247,546 tokens; MSD tagset, XML (TEIxLite) files -
Czech Malach Cross-lingual Speech Retrieval Test Collection
The package contains Czech recordings of the Visual History Archive which consists of the interviews with the Holocaust survivors. The archive consists of audio recordings, four... -
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2014 – VERSION 1)
german version see below The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the... -
The Diorisis Ancient Greek Corpus
An annotated corpus of literary Ancient Greek sourced from the Perseus Canonical Greek Lit repository (https://github.com/PerseusDL/canonical-greekLit), “The Little Sailing”... -
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2018 – VERSION 1)
german version see below The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the...