Meteor Universal: Language Specific Translation Evaluation for Any Target Language

@inproceedings{Denkowski2014MeteorUL,
  title={Meteor Universal: Language Specific Translation Evaluation for Any Target Language},
  author={Michael J. Denkowski and Alon Lavie},
  booktitle={WMT@ACL},
  year={2014}
}
This paper describes Meteor Universal, released for the 2014 ACL Workshop on Statistical Machine Translation. Meteor Universal brings language specific evaluation to previously unsupported target languages by (1) automatically extracting linguistic resources (paraphrase tables and function word lists) from the bitext used to train MT systems and (2) using a universal parameter set learned from pooling human judgments of translation quality from several language directions. Meteor Universal is… Expand
METEOR for multiple target languages using DBnary
TLDR
This paper proposes an extension of METEOR for multiple target languages using an in-house lexical resource called DBnary (an extraction from Wiktionary provided to the community as a Multilingual Lexical Linked Open Data). Expand
Targeted Paraphrasing of Czech Sentences for Machine Translation Evaluation
This thesis is focused on improving the accuracy of machine translation to Czech language using targeted paraphrasing. We develop and compare several approaches for creating new synthetic referencesExpand
IRIS: English-Irish Machine Translation System
TLDR
IRIS is described, a statistical machine translation (SMT) system for translating from English into Irish and vice versa and developed an SMT system aimed at supporting human translators and enabling cross-lingual language technology tasks. Expand
MEE : An Automatic Metric for Evaluation Using Embeddings for Machine Translation
TLDR
MEE, an approach for automatic Machine Translation evaluation which leverages the similarity between embeddings of words in candidate and reference sentences to assess translation quality and it is observed that the proposed metric correlates better with human judgements than the existing widely used metrics. Expand
Machine Translation Evaluation: Unveiling the Role of Dense Sentence Vector Embedding for Morphologically Rich Language
TLDR
This paper presents a meta-analyses of machine Translation evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) that show clear trends in poor performance towards certain types of translation problems. Expand
Meteor++: Incorporating Copy Knowledge into Machine Translation Evaluation
TLDR
A simple statistical method for copy knowledge extraction is introduced, and incorporated into Meteor metric, resulting in a new machine translation metric Meteor++, which can nicely integrate copy knowledge and improve the performance significantly on WMT17 and WMT15 evaluation sets. Expand
Predicting Machine Translation Adequacy with Document Embeddings
TLDR
The approach presented here is learning a Bayesian Ridge Regressor using document skip-gram embeddings in order to automatically evaluate Machine Translation (MT) output by predicting semantic adequacy scores. Expand
Representation Based Translation Evaluation Metrics
TLDR
Experiments on the WMT metric task shows that the metric based on the combined representations achieves the best performance, outperforming the state-of-the-art translation metrics by a large margin. Expand
Dataset for comparable evaluation of machine translation between 11 South African languages
TLDR
The Autshumato machine translation evaluation set is described, which contains data that can be used to evaluate machine translation systems between any of the 11 official South African languages. Expand
KoBE: Knowledge-Based Machine Translation Evaluation
TLDR
This work proposes a simple and effective method for machine translation evaluation which does not require reference translations, and achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 19 REFERENCES
Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
This paper describes Meteor 1.3, our submission to the 2011 EMNLP Workshop on Statistical Machine Translation automatic evaluation metric tasks. New metric features include improved textExpand
Moses: Open Source Toolkit for Statistical Machine Translation
We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c)Expand
Bleu: a Method for Automatic Evaluation of Machine Translation
TLDR
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. Expand
Findings of the 2012 Workshop on Statistical Machine Translation
TLDR
A large-scale manual evaluation of 103 machine translation systems submitted by 34 teams was conducted, which used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 12 evaluation metrics. Expand
Paraphrasing with Bilingual Parallel Corpora
TLDR
This work defines a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and shows how it can be refined to take contextual information into account. Expand
SPEDE: Probabilistic Edit Distance Metrics for MT Evaluation
TLDR
This paper describes Stanford University's submission to the Shared Evaluation Task of WMT 2012, where the proposed metric (SPEDE) computes probabilistic edit distance as predictions of translation quality as well as a novel pushdown automaton extension of the pFSM model. Expand
Statistical Phrase-Based Translation
TLDR
The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation. Expand
Findings of the 2011 Workshop on Statistical Machine Translation
TLDR
The WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics, show how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. Expand
Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric
TLDR
TER-Plus is explored, which is a new tunable MT metric that extends the Translation Edit Rate evaluation metric with tunable parameters and the incorporation of morphology, synonymy and paraphrases, demonstrating significant differences between the types of human judgments. Expand
TESLA at WMT 2011: Translation Evaluation and Tunable Metric
This paper describes the submission from the National University of Singapore to the WMT 2011 Shared Evaluation Task and the Tunable Metric Task. Our entry is TESLA in three different configurations:Expand
...
1
2
...