-
Klassifikation von Tragödien und Komödien bei Calderón de la Barca
Datenpublikation zum Artikel "Klassifikation von Tragödien und Komödien bei Calderón de la Barca": Gesprochener Text von 64 Dramen Pedro Calderón de la Barcas,... -
PoLitBert_v50k_linear_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
PoLitBert_v32k_tri_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
KGR10 FastText Polish word embeddings
Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants... -
PoLitBert_v32k_linear_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
PoLitBert_v32k_tri_125k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
TimeAssign
TimeAssign is a program which recognizes temporal expressions and assigns TimeML labels to words in Polish text using a Bi-LSTM based neural net and wordform embeddings. -
Word embeddings for Polish (KGR10, Fasttext binary) kgr10_fasttext_bin_v1
Distributional language model (binary) for Polish trained on KGR10 using Fasttext (vector dimension: 100). -
PoLitBert_v32k_linear_125k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
Word embeddings CLARIN.SI-embed.sl 2.0
CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC, MaCoCu-sl,... -
Word embeddings CLARIN.SI-embed.mk 0.1
CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram... -
SimLex-999 Slovenian translation SimLex-999-sl 1.0
The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators... -
Word embeddings CLARIN.SI-embed.mk 2.0
CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram... -
Word embeddings CLARIN.SI-embed.hr 2.0
CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC, a 400-million-token-heavy collection of... -
Word embeddings CLARIN.SI-embed.sr 1.0
CLARIN.SI-embed.sr contains word embeddings induced from the srWaC web corpus. The embeddings are based on the skip-gram model of fastText trained on 554,606,544 tokens of... -
Word embeddings CLARIN.SI-embed.sr 2.0
CLARIN.SI-embed.sr contains word embeddings induced from the srWaC and MaCoCu-sr web corpora. The embeddings are based on the skip-gram model of fastText trained on... -
ELMo embeddings model, Slovenian
ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus... -
CroSloEngual BERT
Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing... -
CroSloEngual BERT 1.1
Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing... -
Ekspress news article archive (in Estonian and Russian) 1.0
The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with...