A Resource for Evaluating Graded Word Similarity in Context: CoSimLex

Dataset

PID

The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset features two different contexts per pair. The words were sourced from the English, Croatian, Finnish and Slovenian versions of the original Simlex dataset.

Identifier
PID	http://hdl.handle.net/11356/1308
Related Identifier	https://arxiv.org/abs/1912.05320
Related Identifier	https://www.aclweb.org/anthology/2020.lrec-1.720/
Related Identifier	http://embeddia.eu/
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1308

Provenance
Creator	Armendariz, Carlos; Matthew, Purver; Ulčar, Matej; Pollak, Senja; Ljubešić, Nikola; Robnik-Šikonja, Marko; Granroth-Wilding, Mark; Vaik, Kristiina
Publisher	Queen Mary University
Publication Year	2020
Funding Reference	info:eu-repo/grantAgreement/EC/H2020/825153
Rights	GNU General Public Licence, version 3; https://opensource.org/licenses/GPL-3.0; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	English; Croatian; Finnish; Slovenian; Slovene
Resource Type	lexicalConceptualResource
Format	text/csv; application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 5
Discipline	Linguistics