-
Frequency lists of pivot words and GSE counts
The resource contains data used to estimate the amount of words in Lithuanian texts indexed by the selected Global Search Engines (GSE), namely Google (by Alphabet Inc.), Bing... -
Wizerunek Andreja Babiša i Mateusza Morawieckiego w kontekście sytuacji kryzy...
Zbiór artykułów z prasy czeskiej dotyczący Mateusza Morawickiegi (iDnes) oraz z prasy polskiej dotyczących Andreja Babiša (Rzeczpospolita) -
fronda
Some texts of fronda.pl -
KGR10-RoBERTa
Polish RoBERTa model pre-trained on KGR10 corpora. -
Inforex
Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense... -
Liner2
Rozpoznaje nazwy własne w tekście polskim. -
zmiany klimatu kraków
warsztaty w Krakowie - socjologia -
CorpoGrabber
CorpoGrabber: The Toolchain to Automatic Acquiring and Extraction of the Website Content Jan Kocoń, Wroclaw University of Technology CorpoGrabber is a pipeline of tools to get... -
Polish WSD Datasets
Data and code for the paper published at ICCS 2022: "A Unified Sense Inventory for Word Sense Disambiguation in Polish". The code is available at... -
ELMo Embeddings for Polish
A model of ELMo embeddings for Polish language trained on large textual corpora (KGR10). To retrain the model please use the checkpoint and vocabulary files available at:... -
MWE Świętochowski
Aleksander Świętochowski -
1990_Skubiszewski
pierwsze expose MSZ III RP -
Word Embeddings for Polish
Distributional language models for Polish trained on different corpora (KGR10, NKJP, Wikipedia). -
AspectEmo 1.0: Multi-Domain Corpus of Consumer Reviews for Aspect-Based Senti...
AspectEmo 1.0 Corpus is an extended version of a publicly available PolEmo 2.0 corpus of Polish customer reviews, that was used in many projects on the use of different methods... -
MWE Sienkiewicz, Ogniem i mieczem
Henryk Sienkiewicz -
Big Data language model - STEMMED - RAW data
Big data language model stemmed in RAW format -
KPWr annotation guidelines - coreference
Coreference annotation guidelines describing the process of manual annotation of documents in Polish Corpus of Wrocław University of Technology (KPWr) -
Wikinews_luty_marzec_2020
Test corpus _ 3_03_20 -
PELCRA PARL corpus
The corpus comprises 50 sampled recordings (12 hours) and manual transcriptions (ca. 101 00 word tokens) of parliamentary data.