• Publications
  • Influence
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
We describe METEOR, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations that is designed to directly capture how well-ordered the matched words are in relation to the reference. Expand
  • 1,951
  • 435
Meteor Universal: Language Specific Translation Evaluation for Any Target Language
This paper describes Meteor Universal, a version of the Meteor metric that brings language specific evaluation to previously unsupported target languages by (1) automatically extracting linguistic resources (paraphrase tables and function word lists) from the bitext used to train MT systems. Expand
  • 935
  • 266
METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments
Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. Expand
  • 572
  • 121
Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
We include Ranking and Adequacy versions of the metric shown to have high correlation with human judgments of translation quality as well as a more balanced Tuning version shown to outperform BLEU in minimum error rate training for a phrase-based Urdu-English system. Expand
  • 330
  • 83
Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability
We provide a systematic analysis of the effects of optimizer instability---an extraneous variable that is seldom controlled for---on experimental outcomes, and make recommendations for reporting results more accurately. Expand
  • 457
  • 52
The Meteor metric for automatic evaluation of machine translation
The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Expand
  • 227
  • 32
Humor Recognition and Humor Anchor Extraction
In this work, we first identify several semantic structures behind humor and design sets of features for each structure, and next employ a computational approach to recognize humor. Expand
  • 88
  • 25
Parser Combination by Reparsing
We present a novel parser combination scheme that works by reparsing input sentences once they have already been parsed by several different parsers, generating results that surpass state-of-the-art accuracy levels for individual parsers. Expand
  • 201
  • 21
A Classifier-Based Parser with Linear Run-Time Complexity
We present a classifier-based parser that produces constituent trees in linear time. Expand
  • 135
  • 15
Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output
This paper describes our submissions to the machine translation evaluation shared task in ACL WMT-08. Expand
  • 101
  • 14