Dataset - B2FIND

HamleDT 2.0

HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a...

Universal Dependencies 2.14

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.9

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.11

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

TITUS Latin

ca. 2.000 tokens; linked with relational database; XML-encoding in progress

CELT Corpus of Electronic Texts

searchable online corpus of multilingual texts of Irish literature and history

Deep Universal Dependencies 2.8

Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional...

Universal Derivations v1.1

Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent...

Bosworth-Toller’s Anglo-Saxon Dictionary online

Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used...

Kant-Korpus (Daten des Projekts Bereitstellung und Pflege von Immanuel Kants ...

Philosophical texts of the 18th century: Full text of the authoritative "Akademie-Ausgabe" (excluding most footnotes and editorial notes) and reference texts like A.G....

CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to...

De Latinae Linguae Reparatione treebank

This corpus contains the text of De Latinae Linguae Reparatione authored by Marcus Antonius Sabellicus (1436–1506), annotated with respect to lemmas, part-of-speech tags,...

Universal Dependencies 2.4 Models for UDPipe (2019-05-31)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data...

Botanicus Digital Library

Digital copies of historical botanic papers from the Missouri Botanical Garden Library; Bilddigitalisate von historischen botanischen Schriften; deutschsprachige Texte stellen...

Deltacorpus 1.1

Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger...

Textsammlung von Thomas Gloning

i.a. collection of old herbal books, old cookery books and texts on the history of German language in print media; u.a. eine Sammlung von alten Kräuterbüchern, alten...

Universal Dependencies 2.13

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Segmentations 1.0 (UniSegments 1.0)

Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation...

Universal Dependencies 2.15 models for UDPipe 2 (2024-11-21)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 147 treebanks of 78 languages of Universal Depenencies 2.15 Treebanks, created solely using UD 2.15 data...

Universal Dependencies 2.10

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

907 datasets found