KGR10 FastText Polish word embeddings

Dataset

PID

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants (all possible combinations): - dimension: 100, 300 - method: skipgram, cbow - tool: FastText, Magnitude - source text: plain, plain.lower, plain.lemma, plain.lemma.lower

The link below leads to the NextCloud directory with all variants of embeddings. If you use it, please cite the following article: @article{kocon2018embeddings, author = {Koco\'{n}, Jan and Gawor, Micha{\l}}, title = {Evaluating {KGR10} {P}olish word embeddings in the recognition of temporal expressions using {BiLSTM-CRF}}, journal = {Schedae Informaticae}, volume = {27}, year = {2018}, url = {http://www.ejournals.eu/Schedae-Informaticae/2018/Volume-27/art/13931/}, doi = {10.4467/20838476SI.18.008.10413} }

Identifier
PID	http://hdl.handle.net/11321/606
Metadata Access	https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/606

Provenance
Creator	Kocoń, Jan
Publisher	Wroclaw University of Science and Technology
Publication Year	2018
Rights	GNU GPL3; http://www.gnu.org/licenses/gpl-3.0.en.html; PUB
OpenAccess	true
Contact	clarin-pl(at)pwr.edu.pl

Representation
Language	Polish
Resource Type	languageDescription
Format	application/zip; downloadable_files_count: 1
Discipline	Linguistics