Dataset - B2FIND

tokenizer

Tokenizer is a tool with wich one can design dedicated tokenizers for texts from some domain of interest.

ENIAMtoolkit

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences.

ENIAMtoolkit (2017-03-06)

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences; - LCG...

Prague Dependency Treebank 3.5

The Prague Dependency Treebank 3.5 is the 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied...

CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data

CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to...

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated...

MorphoDiTa: Morphological Dictionary and Tagger

MorphoDiTa: Morphological Dictionary and Tagger is an open-source tool for morphological analysis of natural language texts. It performs morphological analysis, morphological...

Java Porting of OpeNER tokenizer web service for WebLicht

Test WebLicht (https://weblicht.sfs.uni-tuebingen.de/) registry entry for Opener webservice comprising tokenisation only.

8 datasets found