Replication Data for: A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle.

DOI

The dataset contains the data for the hierarchical cluster analysis as explained in the article "A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle".

The dataset contains the data for the hierarchical cluster analysis as explained in the article "A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle". In total, the dataset contains 3955 observations, which are tokens of the inchoative construction for the following auxiliaries: comenzar, empezar, meter, poner, echar(se), liar, arrancar and romper. The data originates from the the Spanish Web corpus (esTenTen18), accessed via Sketch Engine. Only the European Spanish subcorpus was selected. The search syntax that was used to detect the inchoative construction was the following: “[lemma="empezar"] [tag="R."]{0,3}"a"[tag="V."] within " (replacing the concrete lemma "empezar" by other lemma's for each auxiliary, see Spinc_queries_20221202.txt for all concrete corpus queries). After downloading samples of 10.000 tokens per auxiliary, the samples were manually cleaned. Only 500 tokens per auxiliary were retained in the dataset. Next, the data were annotated for the infinitive observed after the preposition 'a' and for the semantic class to which this infinitive belongs, following the existing ADESSE classification (see below), besides other criteria that are not taken into account for this study. Concretely, the variables 'INF' (infinitive) and 'Class' were used as input for the hierarchical cluster analysis (see data-specific sections below for more information about the variables).

Identifier
DOI https://doi.org/10.18710/DR8QKQ
Related Identifier https://doi.org/10.21825/kzm.87036
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/DR8QKQ
Provenance
Creator Van Hulle, Sven ORCID logo
Publisher DataverseNO
Contributor Van Hulle, Sven; Ghent University; The Tromsø Repository of Language and Linguistics
Publication Year 2023
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Van Hulle, Sven (Ghent University)
Representation
Resource Type corpus data; Dataset
Format text/plain; text/csv; type/x-r-syntax
Size 8147; 80915; 62008; 1260; 2182
Version 1.1
Discipline Humanities