Dataset - B2FIND

Background data for: Some obstacles to replication in corpus linguistics

This dataset contains tabular files recording occurrences and frequencies of modal verbs in the Brown family corpora; nine modal verbs (can, could, may, might, must, shall,...

Replication Data for: Understanding ‘many’ through the lens of Ukrainian багато

Dataset description: The General Regionally Annotated Corpus of Ukrainian (GRAC, Shvedova et al. 2017-2024, uacorpus.org) was consulted to collect data for further analysis...

Replication Data for: “Threat” in Russian – A Linguistic Perspective

The dataset includes examples of usages of groza and ugroza from the Russian National Corpus (RNC). The dataset covers the period from 1700 to 2020 and consists of 4858...

Background data (adapted from Jenset & McGillivray 2017) for: Down-sampling f...

Dataset description This dataset, which is adapted from Jenset and McGillivray (2017), contains tabular files documenting the alternating usage of -(e)th and -(e)s to mark...

Replication Data for: A network of allostructions: quantified subject constru...

Data and R code are provided for statistical analysis of approximately 39,000 corpus examples of predicate agreement in constructions with quantified subjects in Russian. The...

Replication Data for: Subject Placement in the History of Latin

The present dataset was used in a corpus study on the diachrony of subject placement in the history of Latin, to appear in 'Catalan Journal of Linguistics'. The main file...