CLARIN - Repositories

Linguistic digital repository based on DSpace 5.2

One of the goals of LINDAT/CLARIN Centre for Language Research Infrastructure is to provide technical background to institutions or researchers who wants to share their tools...

Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

EngVallex - English Valency Lexicon

EngVallex is the English counterpart of the PDT-Vallex valency lexicon, using the same view of valency, valency frames and the description of a surface form of verbal arguments....

Universal Dependencies 2.15

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

SynSemClass 5.1

The SynSemClass synonym verb lexicon version 5.1 is a multilingual resource that enriches previous editions of this event-type ontology with a new language, Spanish. The...

Universal Dependencies 1.0

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

SynSemClass2.0

The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language...

SynSemClass 1.0

The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language...

Universal Dependencies 1.2

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Universal Dependencies 2.5

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings

Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together...

Open SDP

The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data...

Universal Dependencies 1.1

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

VIADAT-STAT (2019-12-31)

A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform. Developed in cooperation with ÚSD AV ČR and NFA.

VIADAT-ANALYZE

A VIADAT module; VIADAT-ANALYZE is an interactive environment that enables the end user to work with stored, annotated and indexed audio recordings. Allowing visualization and...

MorfFlex CZ 2.0

MorfFlex CZ 2.0 is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of...

Universal Dependencies 2.8.1

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

VIADAT-ANNOTATE (2019-12-31)

A VIADAT module; VIADAT-ANNOTATE is an interactive annotation environment. Developed in cooperation with ÚSD AV ČR and NFA.

SynSemClass 3.5

The SynSemClass 3.5 synonym verb lexicon investigates semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English and German-English language...

Prague DaTabase of Spoken Czech 1.0

PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and...

102 datasets found