-
EKKD115: Eesti mitmekeelse keelekeskkonna andmestik
Siin repositooriumis on projekti "Eesti mitmekeelse keelekeskkonna andmestik" raames kogutud tekstid ja link keelemaastike pildikaardile. 1) Eesti-inglise kakskeelsete... -
Suuline eesti keel arvudes. Sagedusandmestikud
Siin repositooriumis on projekti "Suuline eesti keel arvudes" raames koostatud sagedusandmestikud, mis kirjeldavad suulist eesti keelt. Andmestikud põhinevad Eesti keele... -
ASR database ARTUR 0.1 (transcriptions)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
ASR database ARTUR 0.1 (audio)
ARTUR is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,035 hours of speech, although only 840... -
A Digital Dictionary of Tunis Arabic - TUNICO (ELEXIS)
A corpus-based dictionary, enriched with historical data. The dictionary was not only built on data from the corpus of spoken language that was compiled in the same project, but... -
ASR database ARTUR 1.0 (transcriptions)
Artur 1.0 is a speech database designed for the needs of developing automatic speech recognition for the Slovenian language. The complete database includes 1,067 hours of... -
Corpus of metaphorical expressions in spoken Slovene language G-KOMET 1.0
G-KOMET (a corpus of metaphorical expressions in spoken Slovene language) is an upgrade of the hand-annotated written corpus for metaphorical expressions KOMET... -
List of formulaic sequences in spoken Slovenian
This document contains 2,374 formulaic sequences in spoken Slovenian, i.e. frequently recurring strings of two to five words, manually annotated for syntactic structure,... -
ASR database ARTUR 1.0 (audio)
Artur 1.0 is a speech database designed for the needs of automatic speech recognition for the Slovenian language. The database includes 1,067 hours of speech. 884 hours are... -
TED-ELH Parallel Corpus
The corpus contains parallelly aligned scripts of TED Talks in English, Lithuanian, and Hebrew. It contains spoken language data. -
Prague Dependency Treebank of Spoken Language (PDTSL) 0.5
The first edition of a speech corpus with a speech reconstruction layer (edited transcript). The project of speech reconstruction of Czech and English has been started at UFAL... -
Languages in Migration
LANGUAGES IN MIGRATION is designed as a representation of authentic spoken Czech and German that is used in informal speech (private environment, spontaneity, unpreparedness... -
Bavaria's Dialects Online
Bavaria's Dialects Online (BDO) is the digital language information system of the three projects "Bavarian Dictionary", "Franconian Dictionary", and "Dialectological Information... -
ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole... -
ORAL2013: balanced corpus of informal spoken Czech (transcriptions & audio)
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole... -
ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...
ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole... -
ORAL2013: balanced corpus of informal spoken Czech (transcriptions)
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole... -
Large-Scale Colloquial Persian 0.5
"Large Scale Colloquial Persian Dataset" (LSCP) is hierarchically organized in asemantic taxonomy that focuses on multi-task informal Persian language understanding as a... -
Das Kiezdeutschkorpus "KiDKo": Zusatzkorpora
Aditional corpus I "Frog Story" oral presentation of the picture story (Mayer 1969), written reproduction of the "Frog Story" from memory. Additional corpus... -
Das Kiezdeutschkorpus (KiDKo)
A multi-modal digital corpus of spontaneous discourse data from informal, oral peer group in multi- and monoethnic speech communities. Multimodales, digitales Korpus...