-
Keyword extraction datasets for Croatian, Estonian, Latvian and Russian 1.0
EACL Hackashop Keyword Challenge Datasets In this repository you can find ids of articles used for the keyword extraction challenge at EACL Hackashop on News Media Content... -
Slovenian keyword extraction dataset from SentiNews 1.0
The dataset consists of 7514 Slovenian news articles from the SentiNews 1.0 corpus by Bučar et al. 2017 (http://hdl.handle.net/11356/1110) which had available article keywords.... -
EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news a...
This dataset contains articles from EMBEDDIA Media partners with various information added by the tools developed within the EMBEDDIA project: - 12,390 Estonian articles from... -
OAGK Keyword Generation Dataset
OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with... -
OAGKX Keyword Generation Dataset
OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keyword strings from scientific articles. The texts were lowercased and tokenized... -
KER - Keyword Extractor
KER is a keyword extractor that was designed for scanned texts in Czech and English. It is based on the standard tf-idf algorithm with the idf tables trained on texts from...