Dataset - B2FIND

EvaLatin 2020 models for UDPipe 2 (2020-08-31)

POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at...

Universal Dependencies 2.12 models for UDPipe 2 (2023-07-17)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 131 treebanks of 72 languages of Universal Depenencies 2.12 Treebanks, created solely using UD 2.12 data...

Project Gutenberg

Possibility to download or to browse free electronic books; Angebot: Download von und Online-Zugang zu frei verfügbaren E-Books; deutschsprachige Literatur stellt nur einen...

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

The latinpipe-evalatin24-240520 is a PhilBerta-based model for LatinPipe 2024 https://github.com/ufal/evalatin2024-latinpipe, performing tagging, lemmatization, and dependency...

Corpus Thomisticum

"A scholarly edition of Aquinas's Opera omnia, with a lexical database, a dictionary, two collection of historical sources, and an extensive bibliography."

Plaintext Wikipedia dump 2018

Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at...

Universal Dependencies 2.8.1

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Deep Universal Dependencies 2.7

Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3424). It contains additional...

Digitale Sammlungen der Universitäts- und Landesbibliothek Münster

Digital copies of historical books and journals from the ULB Münster; collections from the region of Westphalia; Bilddigitalisate von Büchern und Zeitschriften aus dem...

A Human-Annotated Dataset of Scanned Images and OCR Texts from Medieval Docum...

These are supplementary materials for an open dataset of scanned images and OCR texts from 19th and 20th century letterpress reprints of documents from the Hussite era. The...

Wortschatz

Collected from newspaper texts, webcrawling, etc.: words (+frequency), cooccurrences (+graph), left/right neighbours, example sentences

Universal Dependencies 2.4

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Medieval Charter Sections Corpus

This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection...

CoNLL 2018 Shared Task System Outputs

Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.

Universal Dependencies 2.8

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.2

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.6

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Deep Universal Dependencies 2.4

Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-2988). It contains additional...

Universal Dependencies 1.4

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.7

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

907 datasets found