On Some Pitfalls in Automatic Evaluation and Significance Testing for MT

@inproceedings{Riezler2005OnSP,
  title={On Some Pitfalls in Automatic Evaluation and Significance Testing for MT},
  author={Stefan Riezler and John T. Maxwell},
  booktitle={IEEvaluation@ACL},
  year={2005}
}
We investigate some pitfalls regarding the discriminatory power of MT evaluation metrics and the accuracy of statistical significance tests. In a discriminative reranking experiment for phrase-based SMT we show that the NIST metric is more sensitive than BLEU or F-score despite their incorporation of aspects of fluency or meaning adequacy into MT evaluation. In an experimental comparison of two statistical significance tests we show that p-values are estimated more conservatively by approximate… CONTINUE READING
Highly Influential
This paper has highly influenced 11 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 195 citations. REVIEW CITATIONS

4 Figures & Tables

Topics

Statistics

0102030'06'07'08'09'10'11'12'13'14'15'16'17'18
Citations per Year

195 Citations

Semantic Scholar estimates that this publication has 195 citations based on the available data.

See our FAQ for additional information.