E-rating Machine Translation

Abstract

We describe our submissions to the WMT11 shared MT evaluation task: MTeRater and MTeRater-Plus. Both are machine-learned metrics that use features from e-rater R ©, an automated essay scoring engine designed to assess writing proficiency. Despite using only features from e-rater and without comparing to translations, MTeRater achieves a sentencelevel correlation with human rankings equivalent to BLEU. Since MTeRater only assesses fluency, we build a meta-metric, MTeRaterPlus, that incorporates adequacy by combining MTeRater with other MT evaluation metrics and heuristics. This meta-metric has a higher correlation with human rankings than either MTeRater or individual MT metrics alone. However, we also find that e-rater features may not have significant impact on correlation in every case.

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@inproceedings{Parton2011EratingMT, title={E-rating Machine Translation}, author={Kristen Parton and Joel R. Tetreault and Nitin Madnani and Martin Chodorow}, booktitle={WMT@EMNLP}, year={2011} }