This entry consists of a TSV file containing a list of 66,347 Slovene word pairs from the Sloleks Morphological Lexicon of Slovene (v2.0; that have been automatically identified as morphologically related according to a number of manually designed morphological relation rules (e.g. "dež" -> "deževen", "pisati" -> "pisatelj", "prijatelj" -> "prijateljica").
Each line in the list contains the following columns:
- original lemma (e.g. "pisati"),
- related lemma (e.g. "pisatelj"),
- original lemma, automatically deconstructed into individual word parts (e.g. "pis_ati"),
- related lemma, automatically deconstructed into individual word parts (e.g. "pis_at_elj"),
- MTE-6 lexical features of the original lemma (e.g. "G"),
- MTE-6 lexical features of the related lemma (e.g. "Som"),
- ID of the original lemma from Sloleks 2.0,
- ID of the related lemma from Sloleks 2.0,
- the overlapping or central part (common to both the original and the related lemmas; e.g. "pis")
- the ID of the morphological relation rule used to identify the relation (e.g. "G.Som.5.2.1"),
- the morphological relation rule (e.g. "[G]_ati -> [G]_at_elj").
- MTE-6 refers to MULTEXT-East Version 6 morphosyntactic specifications for Slovenian, available at
Each rule constitutes a pattern to form a morphological relation. For instance, "[G]_ati -> [G]_at_elj" indicates that a verb (G) ending with the word part "ati" is related to the lemma formed by replacing "_ati" with "_at_elj".
Note that the list contains no proper nouns and no relations for 38 morphological rules that have been included in the hierarchy of rules (listed in the accompanying file nssss_sloleks_word_relation_rules.tsv), but need to take into account additional rules that have not yet been implemented in the current version of the extraction process (such as irregular conversions in overlapping word parts: "gri_sti" - "griz_enj_e", "sneg" - "snež_ak").
Version 1.1 also contains manual evaluation scores for approximately 5,000 pairs which were sampled in a stratified manner (by rules). The pairs were reviewed by a linguist and assigned one of three scores (0 - inadequate; 1 - acceptable; 2 - adequate).