Lithuanian Word embeddings

PID

GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1) the vocabulary was compiled, eliminating words the the frequency less than 5; 2) word co-occurrence matrix was generated with window size of 5; 3) this matrix was randomly shuffled; 4) word vectors were generated (100 iterations, 200 dimensions). The final result consists of 331 203 unique word vectors.

Identifier
PID http://hdl.handle.net/20.500.11821/26
Related Identifier http://mwe.lt/
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/26
Provenance
Creator Bielinskienė, Agnė; Boizou, Loïc; Bumbulienė, Ieva; Kovalevskaitė, Jolanta; Krilavičius, Tomas; Mandravickaitė, Justina; Rimkutė, Erika; Vilkaitė-Lozdienė, Laura
Publisher Baltic Institute of Advanced Technology; Vytautas Magnus University
Publication Year 2019
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics