-
EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news a...
This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from... -
Slovene corpus for aspect-based sentiment analysis - SentiCoref 1.0
SentiCoref 1.0 corpus consists of 837 documents selected from SentiNews 1.0 corpus (http://hdl.handle.net/11356/1110). The documents were selected based on the number of... -
Automatically sentiment annotated Slovenian news corpus AutoSentiNews 1.0
The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and... -
The multilingual sentiment dataset of parliamentary debates ParlaSent 1.0
The dataset consists of mid-length sentences from the parliamentary proceedings of Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom,... -
Sentiment Annotated Dataset of Croatian News
We present a collection of sentiment annotations for news articles (article links) in Croatian language. A set of 2025 news articles was gathered from 24sata, one of the leading... -
Emoji Sentiment Ranking 1.0
A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages.... -
News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian a...
We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages... -
The sentiment corpus of parliamentary debates ParlaSent-BCS v1.0
The dataset consists of mid-length sentences from the Bosnian, Croatian and Serbian parliamentary proceedings, annotated with a 6-level sentiment schema (defined below). The... -
xLiMe Twitter Corpus XTC 1.0.1
The xLiMe Twitter Corpus contains tweets in German, Italian and Spanish manually annotated with part-of-speech, named entities, and message-level sentiment polarity. In total,... -
Twitter sentiment for 15 European languages
The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages.... -
Manually sentiment annotated Slovenian news corpus SentiNews 1.0
Between 2 and 6 annotators independently sentiment annotated a stratified random sample of 10,427 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and...