Replication Data for: A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle.

Dataset

DOI

The dataset contains the data for the hierarchical cluster analysis as explained in the article "A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle". In total, the dataset contains 3955 observations, which are tokens of the inchoative construction for the following auxiliaries: comenzar, empezar, meter, poner, echar(se), liar, arrancar and romper. The data originates from the the Spanish Web corpus (esTenTen18), accessed via Sketch Engine. Only the European Spanish subcorpus was selected. The search syntax that was used to detect the inchoative construction was the following: “[lemma="empezar"] [tag="R."]{0,3}"a"[tag="V."] within " (replacing the concrete lemma "empezar" by other lemma's for each auxiliary, see Spinc_queries_20221202.txt for all concrete corpus queries). After downloading samples of 10.000 tokens per auxiliary, the samples were manually cleaned. Only 500 tokens per auxiliary were retained in the dataset. Next, the data were annotated for the infinitive observed after the preposition 'a' and for the semantic class to which this infinitive belongs, following the existing ADESSE classification (see below), besides other criteria that are not taken into account for this study. Concretely, the variables 'INF' (infinitive) and 'Class' were used as input for the hierarchical cluster analysis (see data-specific sections below for more information about the variables).

Identifier
DOI	https://doi.org/10.18710/DR8QKQ
Related Identifier	https://doi.org/10.21825/kzm.87036
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/DR8QKQ

Provenance
Creator	Van Hulle, Sven
Publisher	DataverseNO
Contributor	Van Hulle, Sven; Ghent University; The Tromsø Repository of Language and Linguistics
Publication Year	2023
Rights	CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true
Contact	Van Hulle, Sven (Ghent University)

Representation
Resource Type	corpus data; Dataset
Format	text/plain; text/csv; type/x-r-syntax
Size	8147; 80915; 62008; 1260; 2182
Version	1.1
Discipline	Humanities