Read Speech Corpus (7G)

Dataset

PID

The corpus of read Lithuanian speech „7G“ was compiled in 2015-2016. The corpus consists of 352 audio recordings with a total duration of over 7 hours. Seven different speakers are reading excerpts of books and a list of isolated words (the list reflects the diversity of triphones in the Lithuanian). The audio recordings are stored as WAV PCM 44.1 kHz 16-bit mono format files. Annotations are stored in MLF format (the format used by the HTK Toolkit). Most of the speakers are young women aged between 20 and 25. The aim was to obtain recordings in as natural a recording environment as possible, so no requirements were placed on the speakers in terms of recording equipment, microphone settings or recording environment. Most of the speakers used personal laptops with a built-in microphone.

Identifier
PID	http://hdl.handle.net/20.500.11821/58
Metadata Access	https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/58

Provenance
Creator	Raškinis, Gailius; Rudžionis, Vytautas
Publisher	Vytautas Magnus University; Vilnius University
Publication Year	2017
Rights	ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/ACA_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; ACA
OpenAccess	true
Contact	info(at)clarin.vdu.lt

Representation
Language	Lithuanian
Resource Type	corpus
Format	text/plain; application/zip; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline	Linguistics