ParCzech 3.0

PID

The ParCzech 3.0 corpus is the third version of ParCzech consisting of stenographic protocols that record the Chamber of Deputies’ meetings held in the 7th term (2013-2017) and the current 8th term (2017-Mar 2021). The protocols are provided in their original HTML format, Parla-CLARIN TEI format, and the format suitable for Automatic Speech Recognition. The corpus is automatically enriched with the morphological, syntactic, and named-entity annotations using the procedures UDPipe 2 and NameTag 2. The audio files are aligned with the texts in the annotated TEI files.

Identifier
PID http://hdl.handle.net/11234/1-3631
Related Identifier http://hdl.handle.net/11234/1-3436
Related Identifier http://hdl.handle.net/11234/1-5360
Related Identifier https://ufal.mff.cuni.cz/parczech
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-3631
Provenance
Creator Kopp, Matyáš; Stankov, Vladislav; Bojar, Ondřej; Hladká, Barbora; Straňák, Pavel
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2021
Funding Reference info:eu-repo/grantAgreement/EC/H2020/825460
Rights Public Domain Dedication (CC Zero); http://creativecommons.org/publicdomain/zero/1.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/x-gzip; application/x-tar; downloadable_files_count: 40
Discipline Linguistics