Lexicon of historical Slovene imp25k 1.1

PID

The imp25k lexicon of historical Slovene was created automatically from the goo300k and foo3M annotated corpora and contains attested and manually verified word forms and their annotations with examples of use. A lexicon entry contains the modern lemma with its part-of-speech and, for archaic words, its gloss (closest modern equivalent(s) or short explanation of their meaning). The lemma is followed by its modern word forms from the corpus (i.e. the complete paradigm of the lemma is not given), and each of these has all its attested historical word forms with examples of usage.

The lexicon is available in source TEI P5 XML and in the much smaller and simpler derived tabular format, which does not contain usage examples. In the latter, multi-word units are joined with the underscore. The 1st column is the word form, the 2nd its modern equivalent, the 3rd its modern lemma, 4th its PoS tag from the IMP morphosyntactic specification, and 5th (where present) the gloss, e.g.: ako_ravnoakoravnoakoravnoCčeprav or ak-liako_liako_liC_Q

Identifier
PID http://hdl.handle.net/11356/1032
Related Identifier https://doi.org/10.1007/s10579-015-9294-7
Related Identifier https://nl.ijs.si/imp/index-en.html
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1032
Provenance
Creator Erjavec, Tomaž
Publisher Jožef Stefan Institute
Publication Year 2014
Funding Reference info:eu-repo/grantAgreement/EC/FP7/215064
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); PUB; https://creativecommons.org/licenses/by/4.0/
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics