Dataset - B2FIND

The Diorisis Ancient Greek Corpus

An annotated corpus of literary Ancient Greek sourced from the Perseus Canonical Greek Lit repository (https://github.com/PerseusDL/canonical-greekLit), “The Little Sailing”...

Universal Dependencies 1.2 Models for UDPipe

Tokenizer, POS Tagger, Lemmatizer and Parser models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use...

CoNLL 2017 Shared Task - UDPipe Baseline Models and Supplementary Materials

Baseline UDPipe models for CoNLL 2017 Shared Task in UD Parsing, and supplementary material. The models require UDPipe version at least 1.1 and are evaluated using the official...

CALEM (Comprehensive Arabic LEMmas)

Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical...

EvaLatin 2020 models for UDPipe 2 (2020-08-31)

POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at...

Universal Dependencies 2.12 models for UDPipe 2 (2023-07-17)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 131 treebanks of 72 languages of Universal Depenencies 2.12 Treebanks, created solely using UD 2.12 data...

UDPipe

UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only...

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

The latinpipe-evalatin24-240520 is a PhilBerta-based model for LatinPipe 2024 https://github.com/ufal/evalatin2024-latinpipe, performing tagging, lemmatization, and dependency...

CoNLL 2018 Shared Task - UDPipe Baseline Models and Supplementary Materials

Baseline UDPipe models for CoNLL 2018 Shared Task in UD Parsing, and supplementary material. The models require UDPipe version at least 1.2 and are evaluated using the official...

Persian Morphologically Segmented Lexicon 0.5

This dataset includes 45300 Persian word forms which are manually segmented into sequences of morphemes.

Universal Dependencies 2.0 Models for UDPipe (2017-08-01)

Tokenizer, POS Tagger, Lemmatizer and Parser models for all 50 languages of Universal Depenencies 2.0 Treebanks, created solely using UD 2.0 data...

Universal Dependencies 2.3 Models for UDPipe (2018-11-15)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 84 treebanks of 56 languages of Universal Depenencies 2.3 Treebanks, created solely using UD 2.3 data...

Universal Dependencies 2.10 models for UDPipe 2 (2022-07-11)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data...

Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

35 datasets found