This research was conducted on a corpus of texts produced by first-year undergraduate students at the University of the Basque Country (UPV/EHU). The corpus is called CATUC: Corpus academico de textos universitarios en castellano (Academic Corpus of University Texts in Spanish). The corpus consists of 270 texts, and it is available after obtaining approval from the UPV/EHU's Ethics Committee. These texts were produced as part of the compulsory subject "Development of Communicative Competence I" over five academic years, from 2019/2020 to 2023/2024. The subcorpus for each academic year is balanced, and all texts are written in Spanish by bilingual Basque-Spanish students. In each folder, there are 3 different subcorpora: one composed of the entire papers (270 texts, 838,757 total words and 20,225 unique word forms), one with the introductions (270 introduction texts extracted from the entire papers), and one with the conclusions (270 conclusions texts extracted from the entire papers).
The participants wrote the texts individually, simulating conference proceedings, as a final test for the subject. Throughout the course, the students studied the characteristics of written academic discourse, focusing on the identification of discursive genres, linguistic features, and discursive characteristics, using a project-based learning methodology. The final project involved participating in a conference and submitting a paper individually to be included in the conference proceedings.
In these texts, students presented a small research project based on the reading of scientific texts in their specialized field. All projects are related to education.
The papers are between 8 and 10 pages long. The various sections of the papers, which were previously worked on during the course, include the title, abstract, introduction, theoretical framework, methodology, results and discussion, conclusions, and bibliography.