-
Monitor corpus of Slovene Trendi 2024-12
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-11 covers the period from January... -
Monitor corpus of Slovene Trendi 2024-11
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-11 covers the period from January... -
Monitor corpus of Slovene Trendi 2024-04
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 73 publishers. Trendi 2024-04 covers the period from January... -
Annotated corpus of Macedonian language-related news articles MetaLangNEWS-Mk
A comprehensive corpus of news articles on the topic of language, published in major Macedonian daily newspapers and news portals in the five-year period of January 1, 2015 -... -
Monitor corpus of Slovene Trendi 2024-10
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-10 covers the period from January... -
Monitor corpus of Slovene Trendi 2024-03
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 70 publishers. Trendi 2024-03 covers the period from January... -
Monitor corpus of Slovene Trendi 2024-02
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 70 publishers. Trendi 2024-02 covers the period from January... -
The news dataset for discriminating between Bosnian, Croatian and Serbian SET...
The SETimes.HBS dataset consists of parallel documents written in Bosnian, Croatian and Serbian, harvested from the already inactive setimes.com website publishing news in the... -
Monitor corpus of Slovene Trendi 2024-07
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 74 publishers. Trendi 2024-07 covers the period from January... -
Latvian Delfi article archive (in Latvian and Russian) 1.0
This dataset is an archive of articles from the Delfi news site from 2015-2019, containing over 180,000 articles (c. 50% in Latvian and 50% in the Russian language). Keywords... -
Monitor corpus of Slovene Trendi 2024-09
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 76 publishers. Trendi 2024-08 covers the period from January... -
Automatically sentiment annotated Slovenian news corpus AutoSentiNews 1.0
The corpus contains 256,567 documents from the Slovenian news portals 24ur, Dnevnik, Finance, Rtvslo, and Žurnal24. These portals contain political, business, economic and... -
Monitor corpus of Slovene Trendi 2024-08
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 107 media websites, published by 77 publishers. Trendi 2024-08 covers the period from January... -
Slovenian keyword extraction dataset from SentiNews 1.0
The dataset consists of 7514 Slovenian news articles from the SentiNews 1.0 corpus by Bučar et al. 2017 (http://hdl.handle.net/11356/1110) which had available article keywords.... -
The news articles reporting on the 2021 Tokyo Olympics data set OG2021 (resea...
The OG2021 corpus contains multilingual news articles that are reporting on the events happening during the 2021 Tokyo Olympics. The data set was created to evaluate the... -
Ekspress news article archive (in Estonian and Russian) 1.0
The dataset is an archive of articles from the Ekspress Meedia news site from 2009-2019, containing over 1.4M articles, mostly in Estonian language (1,115,120 articles) with... -
Monitor corpus of Slovene Trendi 2023-12
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 70 publishers. Trendi 2023-12 covers the period from January... -
Monitor corpus of Slovene Trendi 2024-05
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 73 publishers. Trendi 2024-05 covers the period from January... -
24sata news article archive 1.0
The 24sata news portal consists of a portal with daily news and several smaller portals covering news from specific topics, such as automotive news, health, culinary content,... -
Sentiment Annotated Dataset of Croatian News
We present a collection of sentiment annotations for news articles (article links) in Croatian language. A set of 2025 news articles was gathered from 24sata, one of the leading...