Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

@inproceedings{Lin2004AutomaticEO,
  title={Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics},
  author={Chin-Yew Lin and Franz Josef Och},
  booktitle={ACL},
  year={2004}
}
In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring in-sequence n-grams automatically. The second method relaxes strict n-gram matching to skip-bigram matching. Skip-bigram is any pair of words in… 
Automatic evaluation of translation quality using expanded N-gram co-occurrence
  • Ying Qin, Q. Wen, Jinquan Wang
  • Computer Science
    2009 International Conference on Natural Language Processing and Knowledge Engineering
  • 2009
TLDR
Evaluation experiments on learners' translation and machine translation corpus with expanded n-gram co-occurrence outperform pure BLEU and NIST evaluation in higher correlation with human assessments.
Stochastic Iterative Alignment for Machine Translation Evaluation
TLDR
A metric based on stochastic iterative string alignment (SIA) is proposed, which aims to combine the strengths of both approaches and outperforms existing metrics in overall evaluation and works specially well in fluency evaluation.
Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
TLDR
Experimental results show that the proposed automatic evaluation method for machine translation using noun-phrase chunking obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
Dependency-based Automatic Enumeration of Semantically Equivalent Word Orders for Evaluating Japanese Translations
TLDR
A method to enumerate scrambled sentences from dependency trees of reference sentences and experiments show that the method improves sentence-level correlation between RIBES and human-judged adequacy.
A Naïve Automatic MT Evaluation Method without Reference Translations
TLDR
Users can get MT evaluation reliability in the absence of reference translations, which greatly improving the utility of MT evaluation metrics.
Application of Prize based on Sentence Length in Chunk-based Automatic Evaluation of Machine Translation
TLDR
This paper proposes a new automatic evaluation metric for machine translation based on chunking between the reference and candidate translation that shows stable correlation with human judgment and applies a prize based on sentence-length to the metric.
Extending BLEU Evaluation Method with Linguistic Weight
TLDR
The linear regression method is adopted to capture the human perception on translation quality via word types and n-gram length and indicates that this method brings a much better evaluation performance for both human translation and machine translation than original BLEU.
Evaluation of machine translation with dependent Skip-Ngrams
  • HongPeng Yu, Hongwei Xu
  • Computer Science
    Proceedings of 2012 International Conference on Measurement, Information and Control
  • 2012
TLDR
An automatic evaluation method for the machine translations system that extends the idea of skip-bigram with unequal length grams and dependency relations and qualifies the grams with linguistic knowledge.
STD: An Automatic Evaluation Metric for Machine Translation Based on Word Embeddings
TLDR
Experimental results show that STD has a better and more robust performance than a range of state-of-the-art metrics for both the segment-level and system-level evaluation.
Context-aware Discriminative Phrase Selection for Statistical Machine Translation
TLDR
Inspired by common techniques used in Word Sense Disambiguation, classifiers based on local context to predict possible phrase translations are trained and a significant improvement is obtained in adequacy.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
TLDR
NIST commissioned NIST to develop an MT evaluation facility based on the IBM work, which is now available from NIST and serves as the primary evaluation measure for TIDES MT research.
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
TLDR
A new evaluation method, Orange, is introduced for evaluating automatic machine translation evaluation metrics automatically without extra human involvement other than using a set of reference translations.
A novel string-to-string distance measure with applications to machine translation evaluation
TLDR
A string-to-string distance measure which extends the edit distance by block transpositions as constant cost edit operation and how this distance measure can be used as an evaluation criterion in machine translation is demonstrated.
Bleu: a Method for Automatic Evaluation of Machine Translation
TLDR
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons
TLDR
This paper shows how to induce an N-best translation lexicon from a bilingual text corpus using statistical properties of the corpus together with four external knowledge sources, which improve lexicon quality by up to 137% over the plain vanilla statistical method, and approach human performance.
Thompson NEW DIRECTIONS : Automatic Evaluation of Translation Quality : Outline of Methodology and Report on Pilot Experiment
The original motivation for the work reported here is the desire to improve the situation with respect to evaluation of the performance of computer systems which produce natural language text. At the
Using multiple edit distances to automatically rank machine translation output
TLDR
An automatic ranking method that encodes machine-translated sentences with a rank assigned by humans into multi-dimensional vectors from which a classifier of ranks is learned in the form of a decision tree (DT).
Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics
TLDR
The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.
An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research
TLDR
This paper defines evaluation criteria which are more adequate than pure edit distance and describes how the measurement along these quality criteria is performed semi-automatically in a fast, convenient and above all consistent way using this tool and the corresponding graphical user interface.
Evaluation of machine translation and its evaluation
TLDR
The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives and has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved.
...
1
2
...