CLARIN - Repositories

Kolipsi-2 Corpus v1.1

The Kolipsi-2 Corpus is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI II...

e-LIS: Electronic Bilingual Dictionary Italian Sign Language (LIS) – Italian ...

Legacy files of the former Electronic Bilingual Dictionary Italian Sign Language (LIS) - Italian, the first prototype of an online Italian Sign Language reference dictionary...

MERLIN Written Learner Corpus for Czech, German, Italian 1.0

The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR)...

Kolipsi-2 Corpus v1.0

The Kolipsi-2 Corpus is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI II...

PAISÀ Corpus of Italian Web Text

The Paisà corpus is a large collection of Italian web texts, licensed under Creative Commons (Attribution-ShareAlike and Attribution-Noncommercial-ShareAlike). It has been...

Kolipsi-1 Corpus v1.1

The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project...

VinKo (Varieties in Contact) Corpus v1.2

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

DIDI - The DiDi Corpus of South Tyrolean CMC 1.0.0

The DiDi corpus has an overall size of around 600.000 Tokens gathered from 136 South Tyrolean Facebook users who participated in the DiDi project. It consists of 11.102 Facebook...

MERLIN Written Learner Corpus for Czech, German, Italian 1.1

The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR)...

KONTATTO v1.0

Kontatto is a corpus of transcribed and annotated spoken data collected by Silvia Dal Negro at the Free University of Bozen/Bolzano. It consists of almost 150,000 orthographic...

Kolipsi-1 Corpus v1.0

The Kolipsi-1 L2 is a written learner corpus of German and Italian L2 speakers originating from South Tyrol (Italy). It has been developed as a by-product of the KOLIPSI project...

LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1

LEONIDE is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages....

LEKO v1.0

The LEKO corpora LEKO_Kolipsi and LEKO_Merlin provide lexical annotations for phraseological elements in Italian L2 writing on the basis of a subset of the texts of the...

AThEME Verona-Trento Corpus

The AThEME Verona-Trento Corpus is a spoken corpus composed of data collected during the AThEME project in Work Package 2 ‘Regional Languages’ by the units of Verona and Trento...

MT@BZ translation corpus v1.0

The MT@BZ is a translation corpus that consists of 52 decrees published by the Autonomous Province of Bolzano (South Tyrol) aligned with their machine translated versions. More...

MT@BZ annotation guidelines v1.0

The MT@BZ annotation guidelines are guidelines for legal Italian-German machine translation quality assessment. Particularly, they cover the South Tyrolean German variety. They...

VinKo (Varieties in Contact) Corpus v1.0

VINKO is a spoken corpus based on crowdsourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

VinKo (Varieties in Contact) Corpus v1.1

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

218 datasets found