-
The dataset is a TSV (tab-separated) with five columns: the first two columns represent the codes of the pair of emojis evaluated, the third column their gold standard similarity, the fourth column their gold standard relatedness and the fifth column the average of the previous two values. Each row of the file represents the gold standard evaluation results of a pair of emojis. Remember that in order to retrieve the vectorial embedding corresponding to an emoji in our models, you need to add the token "eoji" before the emoji code.