-
Modelling word learning and recognition using visually grounded speech
A set of recorded isolated nouns, verbs and image annotations used for testing the word recognition performance of our speech2image model. We trained a word recognition model... -
Prague Dependency Treebank - Consolidated 2.0 (PDT-C 2.0)
A manually annotated and genre-diversified language resource with rich linguistic information from morphology and syntax to semantics, the Prague Dependency Treebank –... -
Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)
A richly annotated and genre-diversified language resource, The Prague Dependency Treebank – Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated... -
Acoustic Data Building Toolset
This folder contains data and software tools (in python) that can be used in experiments with phoneme recognition in speech samples recorder in Polish. Acoustic data used here... -
Speech Recognition System for Polish: Studio Quality
This resource contains dockerized models and scripts of an automatic speech recognition system for Polish trained on studio quality speech. The system is based on the Kaldi... -
Speech Recognition System for Polish: Parliamentary Speech
This resource contains dockerized models and scripts of an automatic speech recognition system for Polish trained on Polish Parliament speeches. The system is based on the Kaldi... -
DiaBiz ASR benchmark
An evaluation report with accompanying datasets benchmarking the performance of commercially available ASR services of Polish on the DiaBiz corpus. -
Speech Recognition System for Polish: Polish Film Chronicles
This resource contains dockerized models and scripts of an automatic speech recognition system for Polish trained on recording of the Polish Film Chronicles. The system is based... -
Spoken corpus Gos VideoLectures 4.0 (audio)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Parliamentary spoken corpus of Croatian ParlaSpeech-HR 2.0
The ParlaSpeech-HR dataset is built from the transcripts of parliamentary proceedings available in the Croatian part of the ParlaMint corpus, and the parliamentary recordings... -
Spoken corpus Gos VideoLectures 2.0 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Parliamentary spoken corpus of Polish ParlaSpeech-PL 1.0
The ParlaSpeech-PL dataset is built from the transcripts of parliamentary proceedings available in the Polish part of the ParlaMint corpus, and the parliamentary recordings... -
Spoken corpus Gos VideoLectures 4.0 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. The Gos VideoLectures corpus... -
Spoken corpus Gos VideoLectures 4.2 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training... -
Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR...
This Conformer CTC BPE E2E Automated Speech Recognition model was trained following the NVIDIA NeMo Conformer-CTC recipe (for details see the official NVIDIA NeMo NMT... -
Parliamentary spoken corpus of Czech ParlaSpeech-CZ 1.0
The ParlaSpeech-CZ dataset is built from the transcripts of parliamentary proceedings available in the Czech part of the ParlaMint corpus, and the parliamentary recordings... -
Spoken corpus Gos VideoLectures 4.1 (transcription)
Gos VideoLectures is an add-on to the Gos reference corpus of spoken Slovene (http://hdl.handle.net/11356/1040), and covers public academic speech. It can be used for training... -
ASR training dataset for Croatian ParlaSpeech-HR v1.0
The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the... -
ASR training dataset for Serbian JuzneVesti-SR v1.0
The JuzneVesti-SR dataset consists of audio recordings and manual transcripts from the Južne Vesti website and its host show called '15 minuta'... -
Parliamentary spoken corpus of Serbian ParlaSpeech-RS 1.0
The ParlaSpeech-RS dataset is built from the transcripts of parliamentary proceedings available in the Serbian part of the ParlaMint (ParlaMint-RS) corpus, and the parliamentary...