Dataset - B2FIND

A corpus of Slavic dialects in Albania

A corpus of Slavic dialects in Albania The user-friendly version of the Corpus with search options is available here. These are the main parameters of the corpus:...

HELLO CAMPANIA! Philippines Collection

The Philippines collection contains data for 66 speakers: 32 first generation (G1), 28 second generation (G2), 6 homeland (G0). The collection contains three folders for each...

Map task corpus of heritage BCMS 1.0

The Map task corpus of heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS) consists of elicited conversations (map tasks) by 29 second-generation BCMS speakers originating from...

Posts of German PC Games Online Forum

Contains linguistic annotated data from the Online-Forum PC Games (https://forum.pcgames.de). The forum is concerned about gaming. All posts (approx. 2.4 mio) where scraped in...

VinKo (Varieties in Contact) Corpus v1.2

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

KONTATTO v1.0

Kontatto is a corpus of transcribed and annotated spoken data collected by Silvia Dal Negro at the Free University of Bozen/Bolzano. It consists of almost 150,000 orthographic...

AThEME Verona-Trento Corpus

The AThEME Verona-Trento Corpus is a spoken corpus composed of data collected during the AThEME project in Work Package 2 ‘Regional Languages’ by the units of Verona and Trento...

VinKo (Varieties in Contact) Corpus v1.0

VINKO is a spoken corpus based on crowdsourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

VinKo (Varieties in Contact) Corpus v1.1

VINKO is a spoken corpus based on crowd-sourced audio recordings that has been designed to provide relevant linguistic information about the minority languages and dialects...

INEL Nenets Corpus

Corpus Citation Budzisch, Josefina; Wagner-Nagy, Beáta. 2024. INEL Nenets Corpus. Version 1.0. Publication date 2024-12-31....

INEL Enets Corpus

Corpus Citation Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta. 2024. INEL Enets Corpus. Version 1.0. Publication date 2024-11-30....

INEL Evenki Corpus

Corpus Citation Däbritz, Chris Lasse & Gusev, Valentin. 2021. INEL Evenki Corpus. Version 1.0. Publication date 2021-12-31. Archived at Universität Hamburg....

Replication Data for: Russian verbal borrowings in Udmurt

This is the dataset used in a study of Russian verbal loans in Udmurt. The files contain lists of Russian verbs found in the Udmurt social media corpus...

Türkisch-Englisch-Deutsch bei Herkunftssprechern (TEDH)

The TEDH has been created as part of the project "Foreign Language Acquisition in German-Turkish bilinguals". The TEDH Corpus contains interviews in three languages:...

Hamburg Corpus of Argentinean Spanish (HaCASpa)

Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two...

Catalan in a bilingual context (PhonCAT)

Audio recordings of prompted, read and spontaneous speech data from L1 Catalan speakers from Barcelona. The data is stratified according to three different city districts and...

16 datasets found