Dataset - B2FIND

Replication Data for: Accusative of Negation in ‘Borderland’ Polish

These are the data for a journal article on 'Accusative of Negation in 'Borderland' Polish'. The abstract of the article is below. The data consist of the annotated list of...

ChunkRel WS

ChunkRel-WS is a prototype service for recognition of three syntactic relations between chunks. The service may be run against plain text (input format: text), then the...

Big Data language model - subword - BPE - ARPA

Big data language model based on subword units, based on byte pair encoding in ARPA format

Świgra — a parser of Polish

Świgra is a parser of Polish generating constituency trees using a DCG style grammar stemming from Marek Świdziński’s grammar “Gramatyka formalna języka polskiego” (1992). The...

Verb in plWordNet 4.0 (Guidelines)

The pdf document contains the guidelines of description of Verbs in the Polish part of plWordNet.

Big data language model with part of speech tags stemmed in ARPA format

Pred-A-tor

Tool for creating predicate-argument structures based on syntactic trees created by Świgra parser (http://zil.ipipan.waw.pl/%C5%9Awigra)

Big Data language model - subword - SYLLABED - ARPA

Big data language model based on syllabes in ARPA format.

Polish-Bulgarian Parallel Corpus

Extended dictionary of named entities NELexicon connected with Linked Open Data

This resource contains Polish named entities connected with terminology from available resources within Linked Open Data (e.g. WordNet, DBPedia, Wikipedia, etc.).

Big data language model stemmed in ARPA format

Big data language model stemmed in ARPA format.

Big data language model with part of speech tags stemmed in RAW format

Big data language model stemmed with BPE in ARPA format

WiKNN Text Classifier

WiKNN is an online text classifier service for Polish and English texts. It supports hierarchical labelled classification of user-submitted texts with Wikipedia categories....

MorphoDiTa-based tagger for Polish language

MorphoDiTa-based tagger for Polish language. It is a tool for morphosyntactic unification for the Polish language, according to the NKJP tagset.

KGR10 FastText Polish word embeddings

Distributional language model (both textual and binary) for Polish (word embeddings) trained on KGR10 corpus (over 4 billion of words) using Fasttext with the following variants...

POLFIE Bank, an LFG structure bank of Polish: pol-nkjp1m-pargram-dev

The pol-nkjp1m-pargram-dev structure bank was created using POLFIE: an LFG grammar of Polish. This structure bank contains sentences from the NKJP1M subcorpus of NKJP which were...

PolEmo 2.0 Sentiment Analysis Dataset for CoNLL

PolEmo 2.0: Corpus of Multi-Domain Consumer Reviews, evaluation data for article presented at CoNLL Citation: @inproceedings{kocon-etal-2019-multi, title = "Multi-Level...

Polish-Ukrainian Parallel Corpus

Big Data language model - STEMMED - RAW data

Big data language model stemmed in RAW format

50 datasets found