The dataset comprises 36570 examples of student writing from Slovenian primary and secondary schools, together with authentic (teacher-provided) corrections of language problems in these sentences.
Teacher corrections are categorised into 180 types, using a hierarchically structured system of labels described in the attached document (in Slovenian). Every entry is equipped with corresponding metadata, such as the type of the source text, the educational stage of the author, and the type and the region of the school, where the text was produced (see README for more information).
The data is exported from the Šolar 3.0 corpus (http://hdl.handle.net/11356/1589). The purpose of the dataset is to facilitate easier access for didactical purposes, statistical analyses of language problems in Slovenian primary and secondary education, and machine learning purposes.