Replication Data for: Less is More: Why All Paradigms are Defective, and Why that is a Good Thing

DOI

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native speakers confidently produce and comprehend word forms that they have never witnessed. We present the results of an experiment using a recurrent neural network computational learning model. In particular, we compare the model’s production of unencountered forms using two types of training data: full paradigms vs. single word forms for Russian nouns, verbs, and adjectives. In the long run, the model displays better performance when exposed to the more naturalistic training on single word forms, even though the other training data is much larger as it includes full paradigms for each and every word. We discuss why “defective” paradigms may be better for human learners as well. This post contains data and R code for the grammatical profiles of Russian nouns and the correspondence analysis carried out in Section 3 of the article.

Identifier
DOI https://doi.org/10.18710/VDWPZS
Related Identifier https://doi.org/10.1515/cllt-2018-0031
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/VDWPZS
Provenance
Creator Janda, Laura A ORCID logo; Tyers, Francis M.
Publisher DataverseNO
Contributor Janda, Laura A; UiT The Arctic University of Norway; The Tromsø Repository of Language and Linguistics
Publication Year 2018
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Janda, Laura A (UiT The Arctic University of Norway)
Representation
Resource Type corpus data; Dataset
Format type/x-r-syntax; application/pdf; text/tab-separated-values
Size 2697; 33515; 32472; 9538; 30562; 24627; 25846; 7520
Version 1.1
Discipline Humanities