Dataset - B2FIND

Klassifikation von Tragödien und Komödien bei Calderón de la Barca

Datenpublikation zum Artikel "Klassifikation von Tragödien und Komödien bei Calderón de la Barca": Gesprochener Text von 64 Dramen Pedro Calderón de la Barcas,...

PoLitBert_v50k_linear_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

PoLitBert_v32k_tri_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

KGR10 FastText Polish word embeddings

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants...

PoLitBert_v32k_linear_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

PoLitBert_v32k_tri_125k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

TimeAssign

TimeAssign is a program which recognizes temporal expressions and assigns TimeML labels to words in Polish text using a Bi-LSTM based neural net and wordform embeddings.

Word embeddings for Polish (KGR10, Fasttext binary) kgr10_fasttext_bin_v1

Distributional language model (binary) for Polish trained on KGR10 using Fasttext (vector dimension: 100).

PoLitBert_v32k_linear_125k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

Word embeddings CLARIN.SI-embed.sl 2.0

CLARIN.SI-embed.sl contains word embeddings induced from a large collection of Slovene texts composed of existing corpora of Slovene, e.g GigaFida, Janes, KAS, slWaC, MaCoCu-sl,...

Word embeddings CLARIN.SI-embed.mk 0.1

CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram...

SimLex-999 Slovenian translation SimLex-999-sl 1.0

The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators...

Word embeddings CLARIN.SI-embed.mk 2.0

CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram...

Word embeddings CLARIN.SI-embed.hr 2.0

CLARIN.SI-embed.hr contains word embeddings induced from a large collection of Croatian texts composed of the Croatian web corpus hrWaC, a 400-million-token-heavy collection of...

Word embeddings CLARIN.SI-embed.sr 1.0

CLARIN.SI-embed.sr contains word embeddings induced from the srWaC web corpus. The embeddings are based on the skip-gram model of fastText trained on 554,606,544 tokens of...

Word embeddings CLARIN.SI-embed.sr 2.0

CLARIN.SI-embed.sr contains word embeddings induced from the srWaC and MaCoCu-sr web corpora. The embeddings are based on the skip-gram model of fastText trained on...

ELMo embeddings model, Slovenian

ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on entire Gigafida 2.0 corpus...

CroSloEngual BERT

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing...

CroSloEngual BERT 1.1

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing...

Ekspress news article archive (in Estonian and Russian) 1.0

The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with...

37 datasets found