EXCEPTIUS Corpus v1.0, containing the following data:
- raw documents for 21 countries at national level
- pre-processed data with spacy-udpipe v1.0
- automatically annotated documents for the identification of exceptional measures at sentence level
Country list (ISO 3166-1 alpha-2): AT, BE, HR, CY, CZ, DK, FR, DE, HU, IE, IT, LV, LT, NL, NO, PL, SI, SE, CH, UK
Folder structure: each country has a dedicated folder. Inside each folder you will find the following subfolders:
- raw_text: the raw text data (.txt format)
- processed: the output of the spacy-udpipe v1.0 - each line is a sentence, containing the following info: tokens, lemma, POS, UD dependency relations
- model: the predictions of the trained model (XML pre@36 as reported in Table 4 of the paper). Each line is a sentence, separate by 9 tab - each for a exceptional measure class. 1: signals presence of a class.
The Italy and Norway folder misses the predictions of the models.