Dataset - B2FIND

CEN

Corpus of Economic News (CEN) contains 797 documents from Polish Wikipedia annotated with 65 categories of proper names in ccl format....

The system of the diagnostics in plWordNet

The pdf-document contains the description of the most frequent, regular errors in plWordNet and rules of them semi-automatic correction.

TreeHopper (TreeLSTM): wydźwięk na poziomie zdań i fraz

A Tree-LSTM-based dependency tree sentiment labeler

Polish-Ukrainian Parallel Corpus

Corpus_Sienkiewicz_Novels

Sienkiewicz Novels

MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of ...

MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine,...

PoLitBert_v32k_cos1_2_50k - Polish RoBERTa model

Polish RoBERTa model trained on Polish Wikipedia, Polish literature and Oscar.

WordnetLoom

WordnetLoom – is an wordnet editor application built for the needs of the construction of a the largest Polish wordnet called plWordNet. WordnetLoom provides two means of...

Polimorf

PoliMorf is a morphological dictionary for Polish resulting from the standardization and merger of Morfeusz SGJP and Morfologik. The present version includes extended...

Plumper

Ontology mapper. Mapping plWordNet onto SUMO ontology.

Open license texts sample

Sample corpus of texts distributed under open license. It consists of 20 documents in TXT, DOCX, DOC or ODT format.

KGR10 FastText Polish word embeddings

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants...

Big Data language model in FastText CBOW format

MWE 10 Największych

dabrowska_nocednie3_1933.txt prus_emancypantki_1894.txt sienkiewicz_ogniem_1884.txt kaczkowski_grob_1857.txt prus_faraon_1897.txt sienkiewicz_rodzina_1894.txt...

Street name changes in Poznań, Słubice and Zbąszyń, Poland 1916-2018

The corpus presents a historical overview of street and place (park, bridge, square) name changes in the years 1916-2018 for three Polish cities: Poznań, Słubice and Zbąszyń....

POLFIE-OT: an LFG grammar of Polish with OT marks

POLFIE-OT is a version of POLFIE, an LFG grammar of Polish implemented in the XLE system (Xerox Linguistic Environment), enriched with OT (Optimality Theory) constraints for the...

WCRFT WebLichtService

WCRFT service for WebLicht

Wiki train - 34 categories

Wikipedia, 34 kategorie - zbiór do uczenia klasyfikatora

ENIAMtoolkit

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences.

Vector Extractor

Collocations presented are based on co-occurrences of a selected noun with several features describing it and linked with it by syntactic dependencies. The recognised features...

653 datasets found