Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

Dataset

PID

This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years 2011, 2013 and 2014.

For each sentence, at most 10000 paraphrases were included (randomly selected from the full set).

The goal of using this dataset is to improve automatic evaluation of machine translation outputs.

If you use this work, please cite the following paper:

Tamchyna Aleš, Barančíková Petra: Automatic and Manual Paraphrases for MT Evaluation. In proceedings of LREC, 2016.

Identifier
PID	http://hdl.handle.net/11234/1-1665
Related Identifier	http://ufal.mff.cuni.cz/grants/deprefset
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1665

Provenance
Creator	Barančíková, Petra; Tamchyna, Aleš
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2016
Rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); http://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech
Resource Type	corpus
Format	application/x-gzip; text/plain; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline	Linguistics