Czech SubLex 1.0

PID

Czech subjectivity lexicon, i.e. a list of subjectivity clues for sentiment analysis in Czech. The list contains 4626 evaluative items (1672 positive and 2954 negative) together with their part of speech tags, polarity orientation and source information. The core of the Czech subjectivity lexicon has been gained by automatic translation of a freely available English subjectivity lexicon downloaded from http://www.cs.pitt.edu/mpqa/subj_lexicon.html. For translating the data into Czech, we used parallel corpus CzEng 1.0 containing 15 million parallel sentences (233 million English and 206 million Czech tokens) from seven different types of sources automatically annotated at surface and deep layers of syntactic representation. Afterwards, the lexicon has been manually refined by an experienced annotator.

Identifier
PID http://hdl.handle.net/11858/00-097C-0000-0022-FF60-B
Related Identifier http://ufal.mff.cuni.cz/seance
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11858/00-097C-0000-0022-FF60-B
Provenance
Creator Veselovská, Kateřina; Bojar, Ondřej
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2013
Rights Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0); http://creativecommons.org/licenses/by-nc-sa/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech
Resource Type lexicalConceptualResource
Format application/octet-stream; application/pdf; text/plain; charset=utf-8; downloadable_files_count: 3
Discipline Linguistics