EmoTwi50 [research data]



The dataset is a TSV (tab-separated) with five columns: the first two columns represent the codes of the pair of emojis evaluated, the third column their gold standard similarity, the fourth column their gold standard relatedness and the fifth column the average of the previous two values. Each row of the file represents the gold standard evaluation results of a pair of emojis. Remember that in order to retrieve the vectorial embedding corresponding to an emoji in our models, you need to add the token "eoji" before the emoji code.

DOI https://doi.org/10.34810/data483
Creator Barbieri, Francesco; Ronzano, Francesco ORCID logo; Saggion, Horacio ORCID logo
Publisher CORA.Repositori de Dades de Recerca
Publication Year 2023
Funding Reference European Commission 611383 ; Ministerio de Economía y Competitividad MDM-2015-0502
