This Conformer CTC BPE E2E Automated Speech Recognition model was trained following the NVIDIA NeMo Conformer-CTC recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/intro.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for transcribing Slovene speech to text.
The training, development and test datasets were based on the Artur dataset and consisted of 630.38, 16.48 and 15.12 hours of transcribed speech in standardised form, respectively. The model was trained for 200 epochs and reached WER 0.0429 on the development and WER 0.0558 on the test dataset.