Opinion corpus of Slovene web commentaries KKS 1.001

Dataset

PID

The corpus of web commentaries with sentiment categorizations was developed as a part of BSc Thesis (Kadunc, 2016) and served for evaluation of the Slovene Sentiment Lexicon KSS http://hdl.handle.net/11356/1097. It contains web commentaries about different topics (business, politics, sport, and other) from 4 Slovene web portals (RtvSlo, 24ur, Finance, Reporter). The corpus is in XML format and available in two forms: - original corpus, containing 4,777 commentaries, 898 positive, 3,291 negative and 588 neutral commentaries. - balanced corpus, a subset of the original corpus, containing 1,740 commentaries, 580 of each type of sentiment (positive, negative and neutral).

References: Klemen Kadunc (2016). Določanje sentimenta slovenskim spletnim komentarjem s pomočjo strojnega učenja. Diplomsko delo. Univerza v Ljubljani, Fakulteta za računalništvo in informatiko (in Slovene). http://eprints.fri.uni-lj.si/3317/ Klemen Kadunc, Marko Robnik-Šikonja (2016). Analiza mnenj s pomočjo strojnega učenja in slovenskega leksikona sentimenta. Conference on Language Technologies & Digital Humanities, Ljubljana (in Slovene). http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/

Identifier
PID	http://hdl.handle.net/11356/1115
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1115

Provenance
Creator	Kadunc, Klemen; Robnik-Šikonja, Marko
Publisher	Faculty of Computer and Information Science, University of Ljubljana
Publication Year	2017
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); PUB; https://creativecommons.org/licenses/by/4.0/
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/zip; application/pdf; downloadable_files_count: 4
Discipline	Linguistics