This Punctuation and Capitalisation model was trained following the NVIDIA NeMo Punctuation and Capitalisation recipe (for details see the official NVIDIA NeMo P&C documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/punctuation_and_capitalization.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for restoring punctuation (,.!?) and capital letters in lowercased non-punctuated Slovene text.
The training corpus was built from publicly available datasets, as well as a small portion of proprietary data. In total the training corpus consisted of 38.829.529 sentences and the validation corpus consisted of 2.092.497 sentences.