Frequency list of language problems from Šolar 3.0

Dataset

PID

The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems in these sentences.

Teacher corrections are categorised into 180 types, using a hierarchically structured system of labels described in the attached document (in Slovenian). Every entry is equipped with corresponding metadata, such as the type of the source text, the educational stage of the author, and the type and the region of the school, where the text was produced (see README for more information).

The data is exported from the Šolar 3.0 corpus (http://hdl.handle.net/11356/1589). The purpose of the dataset is to facilitate easier access for didactical purposes, statistical analyses of language problems in Slovenian primary and secondary education, and machine learning purposes.

Identifier
PID	http://hdl.handle.net/11356/1716
Related Identifier	https://www.cjvt.si/prop/en/
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1716

Provenance
Creator	Arhar Holdt, Špela; Rozman, Tadeja; Stritar Kučuk, Mojca; Krek, Simon; Krapš Vodopivec, Irena; Stabej, Marko; Pori, Eva; Goli, Teja; Lavrič, Polona; Laskowski, Cyprian; Kocjančič, Polonca; Klemenc, Bojan; Krsnik, Luka; Žagar, Aleš; Kosem, Iztok
Publisher	Centre for Language Resources and Technologies, University of Ljubljana
Publication Year	2022
Rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	lexicalConceptualResource
Format	text/plain; charset=utf-8; application/octet-stream; text/plain; application/pdf; downloadable_files_count: 3
Discipline	Linguistics