heiDATA - Repositories

Tool for Extracting PP Attachment Disambiguation Dataset

This resource contains code to extract a PP attachment disambiguation dataset as described in the paper: Do and Rehbein (2020). "Parsers Know Best: German PP Attachment...

Begleitdaten zu: "PIA 1. Bericht des Pilotprojekts Inwertsetzung Ausgrabungen...

1) Neolithische Siedlung Cleebronn "Langwiesen IV": Fotodokumentation von Silexfunden 2) Frühmittelalterliches Gräberfeld Cleebronn "Langwiesen IV": Befundzeichnungen,...

Affixoid Dataset (DE)

The dataset contains the manual annotations for the COLING 2018 submission "Distinguishing affixoid formations from compounds" by Josef Ruppenhofer, Michael Wiegand, Rebecca...

Kaiserchronik - digital

The digital edition presents the entire manuscript transmission of the Kaiserchronik, both known and extant, in dual format: digital facsimiles of the manuscripts (where these...

3D Micro-Mapping of Subsidence Stations [Source Code and Data]

This dataset comprises the source code to reproduce the 3D micro-mapping tool for plane adjustment at subsidence stations. In this project, users adjust a plane (height and...

Comparison of perfusion models for quantitative T1 weighted DCE-MRI of rectal...

Twenty-six patients with newly diagnosed rectal carcinoma underwent 3T MRI of the pelvis including a T1 weighted dynamic contrast enhanced (DCE) protocol before treatment. For...

Neural Rerankers for Dependency Parsing

This resource contains code for different types of neural rerankers (RCNN, RCNN-shared and GCN) from the paper: Do and Rehbein (2020). "Neural Reranking for Dependency Parsing:...

Real-World PP Attachment Disambiguation Dataset

This resource contains a German dataset for real-world PP attachment disambiguation. The creation, analysis and experiment results of the dataset are described in the paper: Do...

Lexicon of Abusive Words (EN)

This goldstandard contains a bootstrapped lexicon of abusive words. The lexicon comprises a large set of English negative polar expressions annotated as either abusive or not.

Gammertingen, St. Michael. Auswertung der archäologischen Ausgrabungen insbe...

Die Datensammlung umfasst die Befund- und Funddatenbank zu der monographisch veröffentlichten archäologischen Auswertung der Ausgrabung in der St. Michaelskapelle in...

Sentiment Compound Data (DE)

This dataset contains gold standards that are required for building a classifier that automatically extracts opinion (noun) compounds.

Pooled clone collections by multiplexed CRISPR-Cas12a-assisted gene tagging ...

Data accompanying the paper "Pooled clone collections by multiplexed CRISPR-Cas12a-assisted gene tagging in yeast" by Buchmuller and Herbst et al, 2019, Nat Communications....

Katalog inschriftloser Monumente

Die hier zusammengefassten Tabellen verzeichnen die im Rahmen der Erstellung meiner Dissertation gesammelten, nicht-inschrifttragenden Steinmonumente aus dem Untersuchungsgebiet...

A harmonised testsuite for social media POS tagging (DE)

A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically...

Image-based screening for stress granule regulators

The supplied macro provides the opportunity to analyse large-scale image datasets, derived from image-based screening approaches. It can be used as a base for the segmentation...

Cataloging Cultural Objects (CCO) – The CCO Commons examples in VRA Core 4 XML

“Cataloging Cultural Objects - a Guide to Describing Cultural Works and Their Images” (CCO) provides a data content standard for catalogers of cultural heritage. It is a...

DeModify

deModify consists of 3631 instances, each with three annotations obtained through CrowdFlower. An instance is a short story in which a modifier is annotated with respect to its...

The MSC Data Set

From this page you can download resources we created for modal sense classification as reported in Zhou et al. (2015), Marasović et al. (2016) and Marasović and Frank (2015)...

Twitter Titling Corpus

The Twitter Titling Corpus contains 4002 stance-annotated tweets collected between 20 June 2017 and 30 August 2017 mentioning 6 presidents. Each tweet is annotated for the...

X-SRL Dataset and mBERT Word Aligner

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source...

1,283 datasets found