Dataset - B2FIND

ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...

ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

SYN2010: balanced corpus of written Czech

Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of written language from 2005–2009 and thus it contains a wide range of text types...

SYN2005: balanced corpus of written Czech

Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of written language from 2000–2004 and thus it contains a wide range of text types...

ORAL2008: Balanced corpus of informal spoken Czech

Balanced corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 297 recordings made in 2002–2007 in the whole of Bohemia. All the recordings were made in...

ORAL2013: balanced corpus of informal spoken Czech (transcriptions & audio)

ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

ORTOFON v1: balanced corpus of informal spoken Czech with multi-tier transcri...

ORTOFON v1 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

Corpus "Miljons"

Balanced corpus of Modern Latvian (~ 1 million running words, currently in plain-text), publicly available via Bonito interface

ORAL2013: balanced corpus of informal spoken Czech (transcriptions)

ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (private environment, spontaneity, unpreparedness etc.) in the area of the whole...

8 datasets found