Dataset - B2FIND

Slovene corpus for aspect-based sentiment analysis - SentiCoref 1.0

SentiCoref 1.0 corpus consists of 837 documents selected from SentiNews 1.0 corpus (http://hdl.handle.net/11356/1110). The documents were selected based on the number of...

Training corpus SUK 1.0

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0

Knowledge-Enhanced Winograd Schema Challenge KE-WSC is an upgraded version of the original WSC dataset. It includes the following extensions: Annotation of semantically or...

Training corpus SUK 1.1

The SUK training corpus contains about 1 million tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation, with...

Slovene coreference resolution corpus coref149

This corpus contains a subset of the ssj500k v1.4 corpus, http://hdl.handle.net/11356/1052. Each of 149 documents contains a paragraph from ssj500k that contains at least 100...

CorefUD conversion of Slovene coreference resolution corpus coref149

This corpus is the CorefUD conversion of the coref149 corpus for coreference resolution in Slovene (http://hdl.handle.net/11356/1182). It contains 149 documents annotated with...

CorefUD conversion of Slovene corpus for aspect-based sentiment analysis Sent...

This corpus is the CorefUD conversion of the SentiCoref corpus for coreference resolution in Slovene contained within the SUK 1.1 collection of corpora...

PyTorch model for Slovenian Coreference Resolution

Slovenian model for coreference resolution: a neural network based on a customized transformer architecture, usable with the code published on...

CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)

The corpipe23-corefud1.2-240906 is a mT5-large-based multilingual model for coreference resolution usable in CorPipe 23 https://github.com/ufal/crac2023-corpipe. It is released...

DiscoMT 2015 Shared Task on Pronoun Translation

The data set includes training, development and test data from the shared tasks on pronoun-focused machine translation and cross-lingual pronoun prediction from the EMNLP 2015...

CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

The corpipe24-corefud1.2-240906 is a mT5-large-based multilingual model for coreference resolution usable in CorPipe 24 (https://github.com/ufal/crac2024-corpipe). It is...

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

The corpipe23-corefud1.1-231206 is a mT5-large-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is...

DutchParliament dataset annotated with coreferrence links.

A dataset of 74 documents containing records of parliamentary proceedings from the Dutch Tweede kamer between 2015 and 2020. The data has been manually annotated with...

HyperCoref Corpus Seed Pages

Archive containing the seed URLs for recreating the "HyperCoref" corpus, an automatically extracted corpus of cross-document event coreference links in online news. Further...

Evaluation of neural coreference annotation of simplified German

This poster presents our evaluation of a neural coreference resolver (Schröder et al. 2021) on simplified German texts as well as the results of an annotation study that we...

15 datasets found