Dataset - B2FIND

KGR10 FastText Polish word embeddings

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants...
KGR10-RoBERTa

Polish RoBERTa model pre-trained on KGR10 corpora.
Lithuanian Word embeddings

GloVe type word vectors (embeddings) for Lithuanian. Delfi.lt corpus (~70 million words) and StanfordNLP were used for training. The training consisted of several stages: 1)...
LitLat BERT

Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing...

You can also access this registry using the API (see API Docs).

4 datasets found