A behavioural dataset for studying individual differences in language skills 2019

DOI

This resource contains data from 112 Dutch adults (18-29 years of age) who completed the Individual Differences in Language Skills test battery that included 33 behavioural tests assessing language skills and domain-general cognitive skills likely involved in language tasks. The battery included tests measuring linguistic experience (e.g. vocabulary size, prescriptive grammar knowledge), general cognitive skills (e.g. working memory, non-verbal intelligence) and linguistic processing skills (word production/comprehension, sentence production/comprehension). Testing was done in a lab-based setting resulting in high quality data due to tight monitoring of the experimental protocol and to the use of software and hardware that were optimized for behavioural testing. Each participant completed the battery twice (i.e., two test days of four hours each). Raw data from all tests on both days as well as pre-processed data are provided.Investigations of the psychological, social and biological foundations of human speech and language have largely ignored individual differences in the normal range of abilities. For decades, experimental research in this field has almost exclusively involved college students. In addition, most research has aimed to characterise the average performance of this limited pool of participants. Given such a narrow focus on average performance of college students, hardly anything is known about individual differences in language skills within this group, or among adult speakers and listeners more generally. The long-term goal of the Big Question 4 project is to characterise the variability in language skills in large demographically representative samples of young adults and chart the neurobiological and genetic underpinnings of the behavioural variability.

Behavioural data were collected using the Individual Differences in Language Skills (IDLaS) test battery. The test battery included tests measuring linguistic and non-linguistic skills. The tests covered three broad domains: (1) Linguistic experience, which is the knowledge acquired through an individual’s use of language (e.g., vocabulary, normative rules) and frequency of language exposure (e.g., reading frequency); (2) general cognitive skills, capturing variability in non-verbal skills that have been implicated in language processing (e.g., processing speed, working memory, non-verbal intelligence); and (3) linguistic processing skills, capturing variability in the four main tasks that individuals have to carry out when using language (i.e., word- and sentence-level processing in comprehension and in production). A detailed description of the materials used in each test and of the test procedure can be found in each test folder. Design and general procedure Participants were tested in groups of up to eight individuals at the same time in a quiet room of about 30m2 at the Max Planck Institute for Psycholinguistics. Each participant was seated at a desk with an experimental laptop, a computer mouse and a custom-made button box (two buttons) in front of them. Experimental laptops were Hewlett Packard ‘ProBooks 640G1’ with 14-inch screens, running Windows 7, optimized for experimentation. Participants were seated in a semicircle around the room facing the wall, with approximately 1m – 1.5m space between them. Noise cancelling divider walls (height 1.80 m, width 1 m) were placed between participant desks and the walls in front of them were covered with curtains to absorb as much noise as possible. Beyerdynamic DT790 headsets were used for the presentation of the auditory stimuli and to record participants’ speech. These headsets are commonly used in TV broadcasting and are known to shield environmental noise quite effectively. The headsets also come with high-quality close-to-mouth microphones. For the speech production tests, participants were additionally equipped with earplugs, which they wore underneath the headsets. This was done to ensure that participants’ own speech or speech planning was not influenced by overhearing other participants speak. Participants could still monitor their speech via bone conduction. Most participants indicated that they could not understand what other participants were saying and reported that the discomfort due to wearing earplugs and having to speak at the same time was minimal. Speech was recorded at a sampling rate of 44100 Hz, 16-bit resolution. The tests were either implemented in Presentation© (version 20.0, www.neurobs.com) and run ‘locally’ on the laptops or were implemented as a web application in ‘Frinex’ (framework for interactive experiments, an environment developed by the technical group at the Max Planck Institute for Psycholinguistics) and run online in the laptops’ web browser (Chrome, version 75.0.3770.142). Specifically, all tests where exact timing was critical (e.g., Reaction Time (RT) tests) were run in Presentation, while the rest was implemented in Frinex (see Tables 1 and 2 for an overview). As Frinex has been developed only recently, we did not have reliable data concerning its timing precision (i.e., time stamping of auditory, visual and response events). A test day started at 9.30 a.m., ended at 3.00 o’clock p.m. and was divided into four sessions. Two sessions featured Presentation experiments and two sessions featured Frinex experiments. One session of each kind was run in the morning and one in the afternoon. Session length varied between 45 and 70 minutes depending on participants’ speed in carrying out the tests. Between sessions 1 and 2, and between sessions 3 and 4, participants were invited to take breaks of about 15-20 minutes. Between sessions 2 and 3, participants had a lunch break of 45 minutes. As is common practice in individual differences research, the order of tests (see Tables 1 and 2) and the order of trials within each test was the same for each participant to minimize potential influences of the test procedure on participants’ test performance. As many of the tests were newly developed, no data on test-retest reliability were available. Therefore, all participants were tested twice, with approximately one month’s time between test days (on average 33 days, SD = 8, range = 24 - 93). The test procedure on the second day was identical to that of the first day, except that participants did not fill out the intake questionnaire at the beginning of the first session anymore. Participant codes (UUIDs) were augmented with a ‘_2’ extension on the second test day. Picture and text stimuli (font Calibri 17, font colour RGB 0, 102, 102 (green/blue) were presented against a white background. Moreover, if not stated otherwise, auditory stimuli had a sampling rate of 44100 Hz, 16-bit resolution. We used the Dutch Subtlex and prevalence databases to retrieve the stimulus words’ frequency and prevalence, respectively.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-854399
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=880ec73fcb138bff35e96bdff5a7a84c2fb22ff221e7499fe8829679f3266afa
Provenance
Creator Hintz, F, Max Planck Institute for Psycholinguistics; Dijkhuis, M, Max Planck Institute for Psycholinguistics; van ‘t Hoff, V, Max Planck Institute for Psycholinguistics; McQueen, J, Radboud University; Meyer, A, Max Planck Institute for Psycholinguistics
Publisher UK Data Service
Publication Year 2020
Funding Reference Netherlands Organisation for Scientific Research (NWO)
Rights Florian Hintz, Max Planck Institute for Psycholinguistics. James M. McQueen, Radboud University. Antje S. Meyer, Max Planck Institute for Psycholinguistics; The Data Collection is available for download to users registered with the UK Data Service. Commercial use of data is not permitted.
OpenAccess true
Representation
Language English
Resource Type Numeric; Text; Audio
Discipline Humanities; Linguistics; Psychology; Social and Behavioural Sciences
Spatial Coverage Nijmegen; Netherlands