EXCEPTIUS Corpus

Dataset

DOI

EXCEPTIUS Corpus v1.0, containing the following data: - raw documents for 21 countries at national level - pre-processed data with spacy-udpipe v1.0 - automatically annotated documents for the identification of exceptional measures at sentence level

Country list (ISO 3166-1 alpha-2): AT, BE, HR, CY, CZ, DK, FR, DE, HU, IE, IT, LV, LT, NL, NO, PL, SI, SE, CH, UK

Folder structure: each country has a dedicated folder. Inside each folder you will find the following subfolders: - raw_text: the raw text data (.txt format)
- processed: the output of the spacy-udpipe v1.0 - each line is a sentence, containing the following info: tokens, lemma, POS, UD dependency relations - model: the predictions of the trained model (XML pre@36 as reported in Table 4 of the paper). Each line is a sentence, separate by 9 tab - each for a exceptional measure class. 1: signals presence of a class.

The Italy and Norway folder misses the predictions of the models.

Identifier
DOI	https://doi.org/10.34894/ZUWAPS
Metadata Access	https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/ZUWAPS

Provenance
Creator	Caselli, Tommaso ; Egger, Clara; Tziafas, Georgios; De Saint-Phalle, Eugenie
Publisher	DataverseNL
Contributor	Caselli, Tommaso
Publication Year	2021
Funding Reference	ZonMw, 10430032010026
Rights	CC0 Waiver; info:eu-repo/semantics/openAccess; https://creativecommons.org/publicdomain/zero/1.0/
OpenAccess	true
Contact	Caselli, Tommaso (University of Groningen)

Representation
Resource Type	legal texts; Dataset
Format	application/vnd.openxmlformats-officedocument.wordprocessingml.document; application/zip; application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size	6820; 7681; 233842395; 9233
Version	1.0
Discipline	Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Humanities; Jurisprudence; Law; Life Sciences; Social Sciences; Social and Behavioural Sciences; Soil Sciences