Dataset - B2FIND

Neural Rerankers for Dependency Parsing

This resource contains code for different types of neural rerankers (RCNN, RCNN-shared and GCN) from the paper: Do and Rehbein (2020). "Neural Reranking for Dependency Parsing:...

Real-World PP Attachment Disambiguation Dataset

This resource contains a German dataset for real-world PP attachment disambiguation. The creation, analysis and experiment results of the dataset are described in the paper: Do...

Lexicon of Abusive Words (EN)

This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.

Sentiment Compound Data (DE)

This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.

A harmonised testsuite for social media POS tagging (DE)

A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically...

Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML

“Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a...

DeModify

deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its...

The MSC Data Set

From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015)...

Twitter Titling Corpus

The Twitter Titling Corpus contains 4002 stance-annotated tweets collected between 20 June 2017 and 30 August 2017 mentioning 6 presidents. Each tweet is annotated for the...

X-SRL Dataset and mBERT Word Aligner

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source...

Converter for content-to-head style syntactic dependencies

A set of Python scripts that convert function-head style encodings in dependency treebanks in a content-head style encoding (as used in the UD treebanks) and vice versa (for...

Multilingual Modal Sense Classification using a Convolutional Neural Network ...

Abstract Modal sense classification (MSC) is aspecial WSD task that depends on themeaning of the proposition in the modal’s scope. We explore a CNN architecture for...

Datasets for Dependency Tree Reranking

This resource contains the datasets for dependency tree reranking in 3 languages: English, German and Czech. The creation, analysis and experiment results of the datasets are...

COREC – A neural multi-label COmmonsense RElation Classification system

We examine the learnability of Commonsense knowledge relations as represented in CONCEPTNET. We develop a neural open world multi-label classification system that focuses on the...

Neural Dependency Parser with Biaffine Attention

This resource contains the code of the dependency parser used in the paper: Fankhauser, et al. (2020). "Evaluating a Dependency Parser on DeReKo". The parser is a...

Neural PP Attachment Disambiguation Systems

This resource contains code for different types of neural PP attachment disambiguation systems: A disambiguation system inspired by de Kok et al. (2017) but with the ranking...

KGE Algorithms

An updated method for link prediction that uses a regularization factor that models relation argument types Abstract (Kotnis and Nastase, 2017): Learning relations based on...

LIDO-Handbuch für die Erfassung und Publikation von Metadaten zu kulturellen ...

LIDO (Lightweight Information Describing Objects) ist ein XML-Schema für die standardkonforme Bereitstellung von Metadaten über kulturelle Objekte in einer Vielzahl von...

Source Code, Data and Additional Material for the Thesis: "Social Commonsense...

Understanding a social situation requires the ability to reason about the underlying emotions and behaviour of others. For example, when we read a personal story, we use our...

MACE-AL

A method for detecting noise in automatically annotated sequence-labelled data, combining MACE (Hovy et al. 2013) with Active Learning.

3,263 datasets found