-
CEN
Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format.... -
The system of the diagnostics in plWordNet
The pdf-document contains the description of the most frequent, regular errors in plWordNet and rules of them semi-automatic correction. -
TreeHopper (TreeLSTM): wydźwięk na poziomie zdań i fraz
A Tree-LSTM-based dependency tree sentiment labeler -
Polish-Ukrainian Parallel Corpus
Polish-Ukrainian Parallel Corpus -
Corpus_Sienkiewicz_Novels
Sienkiewicz Novels -
MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of ...
MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine,... -
PoLitBert_v32k_cos1_2_50k - Polish RoBERTa model
Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar. -
WordnetLoom
WordnetLoom – is an wordnet editor application built for the needs of the construction of a the largest Polish wordnet called plWordNet. WordnetLoom provides two means of... -
Polimorf
PoliMorf is a morphological dictionary for Polish resulting from the standardization and merger of Morfeusz SGJP and Morfologik. The present version includes extended... -
Plumper
Ontology mapper. Mapping plWordNet onto SUMO ontology. -
Open license texts sample
Sample corpus of texts distributed under open license. It consists of 20 documents in TXT, DOCX, DOC or ODT format. -
KGR10 FastText Polish word embeddings
Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants... -
Big Data language model in FastText CBOW format
Big Data language model in FastText CBOW format -
MWE 10 Największych
dabrowska_nocednie3_1933.txt prus_emancypantki_1894.txt sienkiewicz_ogniem_1884.txt kaczkowski_grob_1857.txt prus_faraon_1897.txt sienkiewicz_rodzina_1894.txt... -
Street name changes in Poznań, Słubice and Zbąszyń, Poland 1916-2018
The corpus presents a historical overview of street and place (park, bridge, square) name changes in the years 1916-2018 for three Polish cities: Poznań, Słubice and Zbąszyń.... -
POLFIE-OT: an LFG grammar of Polish with OT marks
POLFIE-OT is a version of POLFIE, an LFG grammar of Polish implemented in the XLE system (Xerox Linguistic Environment), enriched with OT (Optimality Theory) constraints for the... -
WCRFT WebLichtService
WCRFT service for WebLicht -
Wiki train - 34 categories
Wikipedia, 34 kategorie - zbiór do uczenia klasyfikatora -
ENIAMtoolkit
ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences. -
Vector Extractor
Collocations presented are based on co-occurrences of a selected noun with several features describing it and linked with it by syntactic dependencies. The recognised features...