Corpus ID: 207847397

Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

  title={Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation},
  author={Nikolay Bogoychev and Rico Sennrich},
  • Nikolay Bogoychev, Rico Sennrich
  • Published 2019
  • Computer Science, Mathematics
  • ArXiv
  • The quality of neural machine translation can be improved by leveraging additional monolingual resources to create synthetic training data. Source-side monolingual data can be (forward-)translated into the target language for self-training; target-side monolingual data can be back-translated. It has been widely reported that back-translation delivers superior results, but could this be due to artefacts in the test sets? We perform a case study using French-English news translation task and… CONTINUE READING

    Figures and Tables from this paper.

    AR: Auto-Repair the Synthetic Data for Neural Machine Translation
    BLEU might be Guilty but References are not Innocent
    • 8
    • PDF
    Human-Paraphrased References Improve Neural Machine Translation


    Publications referenced by this paper.
    Neural Machine Translation by Jointly Learning to Align and Translate
    • 12,875
    • Highly Influential
    • PDF
    Attention is All you Need
    • 11,724
    • PDF
    Bleu: a Method for Automatic Evaluation of Machine Translation
    • 12,028
    • PDF
    Neural Machine Translation of Rare Words with Subword Units
    • 2,708
    • PDF
    Improving Neural Machine Translation Models with Monolingual Data
    • 927
    • PDF
    Understanding Back-Translation at Scale
    • 277
    • PDF
    Edinburgh Neural Machine Translation Systems for WMT 16
    • 348
    • PDF
    Domain Adaptation for Statistical Machine Translation with Monolingual Resources
    • 156
    • PDF
    Large Language Models in Machine Translation
    • 483
    • PDF
    Exploiting Source-side Monolingual Data in Neural Machine Translation
    • 136
    • PDF