NeMo Conformer CTC BPE E2E Automated Speech Recognition service RSDO-DS2-ASR-E2E-API 1.1

PID

Automated Speech Recognition service for NeMo Conformer CTC BPE E2E models. For more details about building such models, see the official NVIDIA NeMo documentation (https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/intro.html) and NVIDIA NeMo GitHub (https://github.com/NVIDIA/NeMo). A model for automated speech recognition of Slovene speech can be downloaded from http://hdl.handle.net/11356/1740.

The service accepts as input audio files in WAV 16kHz, 16bit PCM, mono format. The maximal accepted audio duration is 300s. Note that transcription of one 300s audio file on cpu will take advantage of all available cores, consume up to 16GB RAM and may take ~180s (on a system with 24 vCPU). See the service README.md for further details.

Identifier
PID http://hdl.handle.net/11356/1740
Related Identifier https://rsdo.slovenscina.eu/en/speech-technologies
Related Identifier https://github.com/clarinsi/Slovene_ASR_e2e
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1740
Provenance
Creator Lebar Bajec, Iztok; Bajec, Marko; Bajec, Žan
Publisher Faculty of Computer and Information Science, University of Ljubljana
Publication Year 2022
Rights Apache License 2.0; https://opensource.org/licenses/Apache-2.0; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Resource Type toolService
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 1
Discipline Linguistics