Heroes Corpus

Dataset

DOI

Each episode directory contains word-level and segment-level information of the whole episode and also parallel samples extracted under segments_eng and segments_spa subdirectories. Each sample is stored as an WAV audio file, text file and a CSV file containing word timing information and word-level paralinguistic and prosodic features. This dataset contains short audio and text excerpts from the TV series "Heroes" (Copyright Universal Media Studios (2006-2007,2007-2008, 2008-2009)). It is compiled and used only for research purposes. Creation of this dataset is partially financed by the UPF DTIC-Maria de Maeztu Strategic Program. This dataset is created with automated tools. There might be errors due to the automated process.

Heroes corpus contains mapped bilingual (English and Spanish) speech segments from the TV series Heroes. It contains 7000 single speaker speech segments extracted from the original and Spanish dubbed version of 21 episodes. Audio segments are accompanied with subtitle transcriptions and word-level prosodic/paralinguistic information.

Identifier
DOI	https://doi.org/10.34810/data500
Metadata Access	https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data500

Provenance
Creator	Öktem, Alp
Publisher	CORA.Repositori de Dades de Recerca
Publication Year	2023
Rights	Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data500
OpenAccess	true

Representation
Resource Type	Textual data; Dataset
Format	application/zip; text/plain
Size	1467061453; 1159928208; 1568
Version	1.0
Discipline	Humanities