-
CMC training corpus Janes-Tag 2.0
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
Frequency lists of word-level n-grams from the Gigafida 2.0 corpus
Frequency lists of word-level n-grams (or word sets) were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST... -
Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
Frequency lists of word-level n-grams (or word sets) were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction... -
Consonant-vowel structures in the GOS 1.0 corpus 1.1
The lists contain consonant-vowel structures of all lemmas, word forms, and standardized word forms in the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040).... -
Frequency lists of word parts from the GOS 1.0 corpus 1.1
Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool... -
Consonant-vowel structures in the Gigafida 2.0 corpus
The lists contain consonant-vowel structures of all lemmas and word forms in the Gigafida 2.0 corpus. In each unit, its characters were converted as follows: C - consonant (in... -
CMC training corpus Janes-Norm 1.0
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation,... -
CMC training corpus Janes-Tag 1.0
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence... -
List of word relations from the Sloleks 2.0 lexicon 1.1
This entry consists of a TSV file containing a list of 66,347 Slovene word pairs from the Sloleks Morphological Lexicon of Slovene (v2.0; http://hdl.handle.net/11356/1230) that... -
Annotated corpora and tools of the PARSEME Shared Task on Automatic Identific...
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make...