Dataset - B2FIND

Norwegian compounds and their Russian equivalents

This post contains the dataset discussed in two related publications: Nesset, Tore (2018a): When a single word is enough: Norwegian compounds and their Russian counterparts....

Replication data for: Prefix variation in путать: в-. за-, пере- and с-

This case study of the four Natural Perfectives of the Russian simplex verb путать ‘tangle’ sheds light on the following questions: Is it possible to predict the choice of...

Replication Data for: “Threat” in Russian – A Linguistic Perspective

The dataset includes examples of usages of groza and ugroza from the Russian National Corpus (RNC). The dataset covers the period from 1700 to 2020 and consists of 4858...

Replication Data for: How to threaten in Russian: a constructionist approach

This dataset concerns the data for the article that analyzes various linguistic means to carry out threats in Russian with a special focus on 27 constructions tagged as "Threat"...

Replication Data for: When modality and tense meet. The future marker budet ‘...

Dataset description: This is a study of examples of Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will...

Parent-child conversations about motion events (Russian, Russian-German, Czech)

The dataset contains transcripts of parent-child communication over picture stimuli depicting motion events. The transcripts are partly-coded and transcribed in purpose of...

Replication Data for: Typology of reduplication in Russian: constructions wit...

We analyze repetition in Russian from the perspective of the Russian Constructicon which represents over 2200 grammatical constructions described in terms of anchors (fixed...

Replication data for: Big data in Russian linguistics? Another look at paucal...

This post contains a database of Russian numeral constructions from the RuTenTen corpus (https://www.sketchengine.co.uk/rutenten-russian-corpus/). The constructions are of the...

Replication Data for: Predicting Russian aspect by frequency across genres

We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address...

Replication Data for: Russian verbal borrowings in Udmurt

This is the dataset used in a study of Russian verbal loans in Udmurt. The files contain lists of Russian verbs found in the Udmurt social media corpus...

Replication Data for: A network of allostructions: quantified subject constru...

Data and R code are provided for statistical analysis of approximately 39,000 corpus examples of predicate agreement in constructions with quantified subjects in Russian. The...

Replication Data for: Less is More: Why All Paradigms are Defective, and Why ...

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native...

Replication Data for: The decade construction rivalry in Russian: Using a cor...

This dataset contains 3 data files, 5 files with R code, and a short read-me file with documentation. The data files contain information about the development of two competing...

Replication Data for: The acquisition of the English dative alternation by Ru...

Dataset abstract The dataset contains the ratings for a 100-split task performed by Russian learners of English. 272 Russian learners were subdivided into two groups. One...

Replication Data for: A corpus approach to the history of Russian po delimita...

This paper gives an example of how enriched diachronic treebank data can shed new light on an old and conflicted topic, even when that topic is morphological and semantic in...

SimDiK

Data from the SimDiK project.

Kuzmina Archive - Manuscripts

This record comprizes the digitized manuscript collected by Angelina Ivanovna Kuzmina (1924–2002) between 1962 and 1977 plus additional structured...

EXMARaLDA Demo corpus 1.1

A selection of short audio and video recordings in various languages to be used for instruction or demonstration of the EXMARaLDA system. The EXMARaLDA Demo Corpus is a small...

Community Interpreting Database Pilot Corpus (ComInDat)

Audio and video recordings of various types of community interpreted discourse (doctor-patient communication, simulated doctor-patient communication, courtroom communication) in...

39 datasets found