Heroes Corpus

DOI

Each episode directory contains word-level and segment-level information of the whole episode and also parallel samples extracted under segments_eng and segments_spa subdirectories. Each sample is stored as an WAV audio file, text file and a CSV file containing word timing information and word-level paralinguistic and prosodic features. This dataset contains short audio and text excerpts from the TV series "Heroes" (Copyright Universal Media Studios (2006-2007,2007-2008, 2008-2009)). It is compiled and used only for research purposes. Creation of this dataset is partially financed by the UPF DTIC-Maria de Maeztu Strategic Program. This dataset is created with automated tools. There might be errors due to the automated process.

Heroes corpus contains mapped bilingual (English and Spanish) speech segments from the TV series Heroes. It contains 7000 single speaker speech segments extracted from the original and Spanish dubbed version of 21 episodes. Audio segments are accompanied with subtitle transcriptions and word-level prosodic/paralinguistic information.

Identifier
DOI https://doi.org/10.34810/data500
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data500
Provenance
Creator Öktem, Alp ORCID logo
Publisher CORA.Repositori de Dades de Recerca
Publication Year 2023
Rights Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data500
OpenAccess true
Representation
Resource Type Textual data; Dataset
Format application/zip; text/plain
Size 1467061453; 1159928208; 1568
Version 1.0
Discipline Humanities