-
Universal Dependencies 2.9
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Universal Dependencies 2.11
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
JIRS
JIRS is a Passage Retrieval system specially suited for Question Answering. It could be adapted to others languages very easily. ask (Written Language): Information Retrieval... -
Deep Universal Dependencies 2.8
Deep Universal Dependencies is a collection of treebanks derived semi-automatically from Universal Dependencies (http://hdl.handle.net/11234/1-3687). It contains additional... -
Annotated corpora and tools of the PARSEME Shared Task on Automatic Identific...
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make... -
Basic vocabulary on the Human Genome
A vocabulary resulting from the cooperation of the groups of REALITER network that collects the basic terminology mostly used in texts about Genomics. It contains equivalents in... -
Universal Derivations v1.1
Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent... -
CorpusExplorer
Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks... -
PAROLE-SIMPLE-CLIPS
55.000 entries, XML -
CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
CoNLL 2017 and 2018 shared tasks: Multilingual Parsing from Raw Text to Universal Dependencies This package contains the test data in the form in which they ware presented to... -
SenTube
Sentiment analysis of Youtube videos with joint models of text and speech -
Universal Dependencies 2.4 Models for UDPipe (2019-05-31)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data... -
Deltacorpus 1.1
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger... -
Amara - universal subtitles
Large set of subtitles available for download in multiple languages. Can be used as parallel corpus. -
L2 Acquisition P-Moll Norbert Dittmar
Language Acquisition corpus -
OmegaWiki
This dataset has no description
-
Universal Dependencies 2.13
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual... -
Universal Segmentations 1.0 (UniSegments 1.0)
Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation... -
Corpus of Italian Emblem Books
Italian emblem books from the Stirling Maxwell Collection (University of Glasgow). Transcribed text and photographi reproducitons. Searchable and browsable online -
Universal Dependencies 2.15 models for UDPipe 2 (2024-11-21)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 147 treebanks of 78 languages of Universal Depenencies 2.15 Treebanks, created solely using UD 2.15 data...