-
Effects of the German minimum wage on earnings and working time using establi...
This study examines the short-term effects of the introduction of a statutory minimum wage in Germany on hourly wages, monthly wages and paid working hours. We exploit a novel... -
DiaBiz ASR benchmark
An evaluation report with accompanying datasets benchmarking the performance of commercially available ASR services of Polish on the DiaBiz corpus. -
Semantic change detection datasets for Slovenian 1.0
This dataset is meant for evaluation of systems for semantic change detection in Slovenian. The "semantic_shift_gs_dataset folder contains 3 files: 1)... -
The news articles reporting on the 2021 Tokyo Olympics data set OG2021 (resea...
The OG2021 corpus contains multilingual news articles that are reporting on the events happening during the 2021 Tokyo Olympics. The data set was created to evaluate the... -
SloBENCH evaluation framework
The evaluation framework contains public evaluation scripts. All the scripts contain additional Dockerfiles that allow for platform-independent evaluation and exact comparison... -
The news articles reporting on the 2021 Tokyo Olympics data set OG2021 (public)
The OG2021 corpus contains multilingual news articles that are reporting on the events happening during the 2021 Tokyo Olympics. The data set was created to evaluate the... -
SimLex-999 Slovenian translation SimLex-999-sl 1.0
The resource contains English SimLex-999 (Hill et al. 2015) and their Slovene translations. In the translation process, the word pairs were first translated by two translators... -
Dataset for evaluation of Slovene spell- and grammar-checking tools Šolar-Eva...
Šolar-Eval is a specialized dataset designed for the evaluation of Slovene spell- and grammar-checking tools and methodologies. It encompasses 109 essays authored by Slovene... -
A Resource for Evaluating Graded Word Similarity in Context: CoSimLex
The dataset contains human similarity ratings for pairs of words. The annotators were presented with contexts that contained both of the words in the pair and the dataset... -
Manually Ranked Translation Outputs
Manually ranked outputs of Czech-Slovak translations. Three annotators manually ranked outputs of five MT systems (Česílko, Česílko2, Google Translate and two Moses setups) on... -
Optimal Reference Translations from English to Czech
This corpus contains annotations of translation quality from English to Czech in seven categories on both segment- and document-level. There are 20 documents in total, each with... -
Machine Translation Testsuite for Gender-Consistent Translation
Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated... -
COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons
Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture... -
WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, bas...
Czech translation of WordSim353. The Czech translation of English WordSim353 word pairs were obtained from four translators. All translation variants were scored according to... -
LongEval Click-Model Relevance Judgements (Qrels)
The collection comprises the relevance judgments used in the 2023 LongEval Information Retrieval Lab (https://clef-longeval.github.io/), organized at CLEF. It consists of three... -
Digital humanities: Introduction. A 10-week course with practical sessions.
The aim of the course is to introduce digital humanities and to describe various aspects of digital content processing. The course consists of 10 lessons with video material and... -
LEONIDE - Longitudinal Learner Corpus in Italiano, Deutsch and English 1.1
LEONIDE is a longitudinal corpus of student essays documenting the language competences and writing development of lower secondary school students in three different languages.... -
Evaluation of Group Creative Activities - Aphasia New Music Group co-producti...
Aim: To co-produce a method of evaluating the impact of creative music-making on the lives of people with aphasia (language processing difficulties acquired through brain... -
Korkeakoulujen tutkimustoiminnan dynamiikka ja tuloksellisuus 1990
Tutkimusaineisto sisältää kuuden suomalaisen korkeakoululaitoksen tutkijoiden haastattelut. Tutkimusaineiston pääteemoja ovat laitosten tutkimustoiminnan organisaatio,... -
Argumentin laadun arviointitutkimus 2018
Tutkimusaineisto selvittää, miten väitteen esittäjä vaikuttaa vastaajien arvioon argumentin pätevyydestä. Argumenttien teemat ovat irtisanomissuoja, tuulivoiman rakentaminen,...