-
The Level Stress recordings: Våmhus_08
Recording equipment The recordings were done by means of a digital recorder (Fostex FR-2LE) and two AKG C451 B microphones placed on the table in front of the speakers. The... -
ELMCIP Electronic Literature Knowledge Base: Critical Writing
The database ELMCIP Critical writing includes monographs, book chapters, journal articles, reviews etc. written about electronic literature or referenced in electronic... -
ELMCIP Electronic Literature Knowledge Base: Creative Works
The ELMCIP Creative Works database contains works of electronic literature, digital literary art, and print antecedents. Column titles in the data correspond to the data fields... -
Swe-NERC
A resource for training and evaluation of Named Entity Recognition for Swedish -
Slovene corpus for aspect-based sentiment analysis - SentiCoref 1.0
SentiCoref 1.0 corpus consists of 837 documents selected from SentiNews 1.0 corpus (http://hdl.handle.net/11356/1110). The documents were selected based on the number of... -
CMC training corpus Janes-Norm 1.1
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,... -
Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
ReLDI-NormTag-hr 1.1 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word... -
Developmental corpus Šolar 2.0
The Developmental corpus Šolar 2.0 consists of 5,485 texts written by students in Slovene secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school... -
Frequency list of language problems from Šolar 3.0
The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems... -
Multimodal corpus EVA 1.0
EVA Corpus 1.0 consists of one episode of an audio/video session plus corresponding orthographic transcriptions with a duration of 57 minutes. The multi-party spontaneous... -
Annotated Corpus of Pre-Standardized Balkan Slavic Literature 1.1
The corpus contains 23 linguistically annotated samples of "damaskini" and other Balkan Slavic manuscripts and print editions from the 15th-19th century, together with over 50... -
Terminology identification dataset KAS-term 1.0
The dataset contains 22,950 term candidates extracted from 15 Slovenian PhD theses. The term candidates are of length 1 to 4, extracted via morphosyntactic patterns and the... -
Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0
Knowledge-Enhanced Winograd Schema Challenge KE-WSC is an upgraded version of the original WSC dataset. It includes the following extensions: Annotation of semantically or... -
CMC training corpus Janes-Tag 1.2
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
Tweet code-switching corpus Janes-Preklop 1.0
Janes-Preklop is a corpus of Slovene tweets that is manually annotated for code-switching (the use of words from two or more languages within one sentence or utterance),... -
Font ZRCalo 1.0
ZRCalo is an open font meant to gradually phase out the ZRCola font as one of the components of the ZRCola 2 input system (http://hdl.handle.net/11356/1090). The current version... -
Opinion corpus of Slovene web commentaries KKS 1.001
The corpus of web commentaries with sentiment categorizations was developed as a part of BSc Thesis (Kadunc, 2016) and served for evaluation of the Slovene Sentiment Lexicon KSS... -
CMC training corpus Janes-Tag 2.1
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1
ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation,... -
xLiMe Twitter Corpus XTC 1.0.1
The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total,...