The Berta Spoken Corpus contains six hours of recorded speech across a variety of interactional settings. These settings include 57 different speech events, with some captured on video and others, such as telephone or private conversations, recorded as audio. The interactional settings featured in the collection include public speaking, public appearances, public lectures, advertisement, cooking shows, casual conversations, advice sessions and interviews. This corpus was developed as part of the Slovene in the Palm of Your Hand (Slovenščina na dlani) project, designed to provide teachers with an additional tool for working with texts in primary and secondary schools.
All recordings are accompanied by manual transcriptions in two formats:
- Pronunciation-based (literal) transcription: This format provides a phoneme string generated from the orthographic form using letter-to-sound rules.
- Standardized (expanded) orthographic transcription: This format follows standard Slovene spelling to represent the spoken words, with additional rules and word lists applied for non-standard vocabulary.
The entry includes audio files (WAV 44.1 kHz, PCM, 16-bit), video files where available (MP4), and transcription files in TRS format (original Transcriber 1.5.1) as well as text files.