102 datasets found

Creator: Hajič, Jan

Filter Results
  • Linguistic digital repository based on DSpace 5.2

    One of the goals of LINDAT/CLARIN Centre for Language Research Infrastructure is to provide technical background to institutions or researchers who wants to share their tools...
  • Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • EngVallex - English Valency Lexicon

    EngVallex is the English counterpart of the PDT-Vallex valency lexicon, using the same view of valency, valency frames and the description of a surface form of verbal arguments....
  • Universal Dependencies 2.15

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • SynSemClass 5.1

    The SynSemClass synonym verb lexicon version 5.1 is a multilingual resource that enriches previous editions of this event-type ontology with a new language, Spanish. The...
  • Universal Dependencies 1.0

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • SynSemClass2.0

    The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language...
  • SynSemClass 1.0

    The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language...
  • Universal Dependencies 1.2

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • Universal Dependencies 2.5

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings

    Automatic segmentation, tokenization and morphological and syntactic annotations of raw texts in 45 languages, generated by UDPipe (http://ufal.mff.cuni.cz/udpipe), together...
  • Open SDP

    The original SDP 2014 and 2015 data collections were made available under task-specific ‘evaluation’ licenses to registered SemEval participants. In mid-2016, all original data...
  • Universal Dependencies 1.1

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • VIADAT-STAT (2019-12-31)

    A VIADAT module; the purpose of VIADAT-STAT is statistical analysis of recordings stored by the platform. Developed in cooperation with ÚSD AV ČR and NFA.
  • VIADAT-ANALYZE

    A VIADAT module; VIADAT-ANALYZE is an interactive environment that enables the end user to work with stored, annotated and indexed audio recordings. Allowing visualization and...
  • MorfFlex CZ 2.0

    MorfFlex CZ 2.0 is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of...
  • Universal Dependencies 2.8.1

    Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...
  • VIADAT-ANNOTATE (2019-12-31)

    A VIADAT module; VIADAT-ANNOTATE is an interactive annotation environment. Developed in cooperation with ÚSD AV ČR and NFA.
  • SynSemClass 3.5

    The SynSemClass 3.5 synonym verb lexicon investigates semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English and German-English language...
  • Prague DaTabase of Spoken Czech 1.0

    PDTSC 1.0 is a multi-purpose corpus of spoken language. 768,888 tokens, 73,374 sentences and 7,324 minutes of spontaneous dialog speech have been recorded, transcribed and...