The CLASSLA-Stanza model for morphosyntactic annotation of standard Slovenian 2.0

Dataset

PID

This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~98.27.

The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).

Identifier
PID	http://hdl.handle.net/11356/1767
Related Identifier	https://aclanthology.org/W19-3704/
Related Identifier	http://hdl.handle.net/11356/1476
Related Identifier	https://github.com/clarinsi/classla
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1767

Provenance
Creator	Ljubešić, Nikola; Terčon, Luka; Čibej, Jaka
Publisher	Jožef Stefan Institute
Publication Year	2023
Rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	toolService
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 2
Discipline	Linguistics