The sentiment corpus of parliamentary debates ParlaSent-BCS v1.0

PID

The dataset consists of mid-length sentences from the Bosnian, Croatian and Serbian parliamentary proceedings, annotated with a 6-level sentiment schema (defined below). The first 1,300 instances were annotated by two annotators, and a reconciliation procedure was performed if there was disagreement on the simplified 3-level schema (Positive, Negative, Neutral). The latter 1,300 instances were annotated by second annotator only. Besides having the annotations of the two annotators and potential reconciliation annotations, there is also a handy 3-level label available for all instances.

Each sentence can be followed back to the original datasets (https://doi.org/10.5281/zenodo.6517697, https://doi.org/10.5281/zenodo.6521372, https://doi.org/10.5281/zenodo.6521648) via a document and sentence identifier. Date of the speech and the speaker name are given as well. If the speaker is MP, information on party, gender and year of birth are available as well. The dataset is split into a training (2,150 instances), development (150 instances) and testing subset (300 instances).

The full 6-level annotation schema is the following: - Positive for sentences that are entirely or predominantly positive - Negative for sentences that are entirely or predominantly negative - M_Positive for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the positive sentiment in a strict binary classification - M_Negative for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the negative sentiment in a strict binary classification - P_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the positive sentiment in a strict binary classification - N_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the negative sentiment in a strict binary classification

Identifier
PID http://hdl.handle.net/11356/1585
Related Identifier https://www.clarin.eu/parlamint
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1585
Provenance
Creator Mochtak, Michal; Rupnik, Peter; Ljubešić, Nikola
Publisher Jožef Stefan Institute
Publication Year 2022
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Bosnian; Croatian; Serbian
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 1
Discipline Linguistics