A Call for Clarity in Reporting BLEU Scores

@inproceedings{Post2018ACF,
  title={A Call for Clarity in Reporting BLEU Scores},
  author={Matt Post},
  booktitle={WMT},
  year={2018}
}
  • Matt Post
  • Published in WMT 2018
  • Computer Science
The field of machine translation is blessed with new challenges resulting from the regular production of fresh test sets in diverse settings. But it is also cursed---with a lack of consensus in how to report scores from its dominant metric. Although people refer to "the" BLEU score, BLEU scores can vary wildly with changes to its parameterization and, especially, reference processing schemes, yet these details are absent from papers or hard to determine. We quantify this variation, finding… Expand
555 Citations
The Tatoeba Translation Challenge - Realistic Data Sets for Low Resource and Multilingual MT
  • 5
  • PDF
Query-Key Normalization for Transformers
  • 2
  • Highly Influenced
  • PDF
Findings of the First Shared Task on Machine Translation Robustness
  • 31
  • PDF
Findings of the WMT 2020 Shared Task on Machine Translation Robustness
  • 2
  • PDF
SLTEV: Comprehensive Evaluation of Spoken Language Translation
  • 1
  • PDF
Revisiting Low-Resource Neural Machine Translation: A Case Study
  • 86
  • PDF
Revisiting Low-Resource Neural Machine Translation: A Case Study
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
A Structured Review of the Validity of BLEU
  • Ehud Reiter
  • Computer Science
  • Computational Linguistics
  • 2018
  • 90
  • PDF
Treebank Annotation Schemes and Parser Evaluation for German
  • 50
  • PDF
Overview of the IWSLT 2017 Evaluation Campaign
  • 65
  • PDF
Bleu: a Method for Automatic Evaluation of Machine Translation
  • 13,478
  • PDF
Effective Approaches to Attention-based Neural Machine Translation
  • 4,769
  • PDF
Neural Machine Translation of Rare Words with Subword Units
  • 3,450
  • PDF
...
1
2
3
...