Dataset - B2FIND

Prague Dependency Treebank 2.0 Sample Data

This is a small sample dataset from PDT 2.0. As such it can be released under a very permissive CC-BY license.

Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)

A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –...

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated...

Lithuanian Treebank ALKSNIS (2019-10-24)

ALKSNIS v3.0. ALKSNIS v3,0 consists of 3,643 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit...

Lithuanian Treebank ALKSNIS

ALKSNIS v2.1 ALKSNIS v2.1 consists of 2,355 syntactically annotated sentences in the PML (Prague Mark-up Language) format. The format allows researchers to visualise and edit...

Składnica frazowa — a constituency treebank of Polish

Składnica frazowa is a constituency treebank of Polish. The treebank is a result of parsing Polish sentences with the syntactic parser Świgra. For every sentence, the parser...

POLFIE Bank, an LFG structure bank of Polish: pol-nkjp1m-pargram-dev

The pol-nkjp1m-pargram-dev structure bank was created using POLFIE: an LFG grammar of Polish. This structure bank contains sentences from the NKJP1M subcorpus of NKJP which were...

POLFIE Bank, an LFG structure bank of Polish: pol-składnica-pargram

The pol-składnica-pargram structure bank was created using POLFIE: an LFG grammar of Polish. This structure bank contains FULL type sentences from Składnica, which were in turn...

Poliqarp2

Poliqarp2 is a linguistic search engine, capable of searching through large corpora annotated on multiple levels. It is not an upgraded version of Poliqarp, it is a...

SALSA - The SAarbrücken Lexical Semantics Annotation and Analysis Project

The SALSA corpus is based on the TIGER corpus. The TIGER corpus (Version 2.1) consists of app. 900,000 tokens (50,000 sentences) of German newspaper text, taken from the...

Deep Sequoia corpus - PARSEME-FR corpus - FrSemCor

The Sequoia corpus is a set of 3,099 linguistically-annotated French sentences, originating from four sources (Europarl, European Agency Reports, French regional journal L'Est...

Coreference in Universal Dependencies 1.2 (CorefUD 1.2)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Universal Dependencies 2.0

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

Coreference in Universal Dependencies 0.1 (CorefUD 0.1)

CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version...

Prague Discourse Treebank 2.0

PDiT 2.0 is a new version of the Prague Discourse Treebank. It contains a complex annotation of discourse phenomena enriched by the annotation of secondary connectives.

Universal Dependencies 2.0 alpha (obsolete)

This release contains errors in several files. Please use http://hdl.handle.net/11234/1-1983 instead.

HamleDT 3.0

HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that...

HamleDT 2.0

HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a...

Tamil Dependency Treebank v0.1

Tamil Dependency Treebank version 0.1 (TamilTB.v0.1) is an attempt to develop a syntactically annotated corpora for Tamil. TamilTB.v0.1 contains 600 sentences enriched with...

Universal Dependencies 2.14

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual...

64 datasets found