Frequency list of language problems from Šolar 3.0

PID

The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems in these sentences.

Teacher corrections are categorised into 180 types, using a hierarchically structured system of labels described in the attached document (in Slovenian). Every entry is equipped with corresponding metadata, such as the type of the source text, the educational stage of the author, and the type and the region of the school, where the text was produced (see README for more information).

The data is exported from the Šolar 3.0 corpus (http://hdl.handle.net/11356/1589). The purpose of the dataset is to facilitate easier access for didactical purposes, statistical analyses of language problems in Slovenian primary and secondary education, and machine learning purposes.

Identifier
PID http://hdl.handle.net/11356/1716
Related Identifier https://www.cjvt.si/prop/en/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1716
Provenance
Creator Arhar Holdt, Špela; Rozman, Tadeja; Stritar Kučuk, Mojca; Krek, Simon; Krapš Vodopivec, Irena; Stabej, Marko; Pori, Eva; Goli, Teja; Lavrič, Polona; Laskowski, Cyprian; Kocjančič, Polonca; Klemenc, Bojan; Krsnik, Luka; Žagar, Aleš; Kosem, Iztok
Publisher Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2022
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/octet-stream; text/plain; application/pdf; downloadable_files_count: 3
Discipline Linguistics