Slovenian parliamentary corpus (1990-2022) siParl 4.0

Dataset

PID

The siParl 4.0 corpus contains minutes of the Assembly of the Republic of Slovenia for 11th legislative period 1990-1992, minutes of the National Assembly of the Republic of Slovenia from the 1st to the 8th legislative period 1992-2022, minutes of the working bodies of the National Assembly of the Republic of Slovenia from the 2nd to the 8th legislative period 1996-2022, and minutes of the Council of the President of the National Assembly from the 2nd to the 8th legislative period 1996-2022. The corpus comprises of over 13 thousand sessions, one million speeches and 230 million words. The corpus is encoded according to the Parla-CLARIN schema (https://github.com/clarin-eric/parla-clarin). Each mandate is in one directory, and each session in one file.

As opposed to the previous version 3.0, this version adds new data (minutes of the National Assembly of the Republic of Slovenia of the 8th legislative period) and corrects many errors.

This item comprises the following datasets: 1. source DARAH-SI Parla-CLARIN encoded corpus in TEI format; 2. linguistically annotated Parla-CLARIN encoded corpus: tokenisation, MSD tagging, lemmatisation, Universal Dependencies features and syntactic parses, named entities; 3. automatically derived corpus in plain text with metadata on speeches; 4. automatically derived linguisticaly annotated corpus in CoNLL-U (Universal Dependencies) format with metadata on speeches; 5. automatically derived linguisticaly annotated corpus in vertical format used by CWB and Sketch Engine concordancers, together with registry file as used on the CLARIN.SI concordancers.

Identifier
PID	http://hdl.handle.net/11356/1936
Related Identifier	http://hdl.handle.net/11356/1748
Related Identifier	https://doi.org/10.1007/s10579-024-09746-8
Related Identifier	https://github.com/DARIAH-SI/siParl/
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1936

Provenance
Creator	Pančur, Andrej; Meden, Katja; Erjavec, Tomaž; Ojsteršek, Mihael; Šorn, Mojca; Blaj Hribar, Neja
Publisher	Institute of Contemporary History
Publication Year	2024
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; downloadable_files_count: 5
Discipline	Linguistics