Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

PID

This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years 2011, 2013 and 2014.

For each sentence, at most 10000 paraphrases were included (randomly selected from the full set).

The goal of using this dataset is to improve automatic evaluation of machine translation outputs.

If you use this work, please cite the following paper:

Tamchyna Aleš, Barančíková Petra: Automatic and Manual Paraphrases for MT Evaluation. In proceedings of LREC, 2016.

Identifier
PID http://hdl.handle.net/11234/1-1665
Related Identifier http://ufal.mff.cuni.cz/grants/deprefset
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1665
Provenance
Creator Barančíková, Petra; Tamchyna, Aleš
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2016
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); http://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Czech
Resource Type corpus
Format application/x-gzip; text/plain; text/plain; charset=utf-8; downloadable_files_count: 2
Discipline Linguistics