The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom, annotated with a 6-level sentiment schema (defined below). The data coming from the parliaments of Bosnia and Herzegovina, Croatia and Serbia are organised as a single parliament group, named "BCS", due to the similarity of the official languages in these countries. For each of the six parliaments / parliament groups, 2,600 training instances were annotated by two annotators, with one additional conflict resolution step. While these training instances were sampled via sentiment lexicons to contain more sentiment-loaded sentences, two test sets were randomly sampled from selected parliaments, one from the BCS parliament group, another from the parliament of the United Kingdom. Each test set consists of 2,600 sentences, annotated by one highly trained annotator. Training datasets were internally split into "train", "dev" and "test" portions" for performing language-specific experiments.
The 6-level annotation schema is the following:
- Positive for sentences that are entirely or predominantly positive
- Negative for sentences that are entirely or predominantly negative
- M_Positive for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the positive sentiment
- M_Negative for sentences that convey an ambiguous sentiment or a mixture of sentiments, but lean more towards the negative sentiment
- P_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the positive sentiment
- N_Neutral for sentences that only contain non-sentiment-related statements, but still lean more towards the negative sentiment