"Bilingual Expert" Can Find Translation Errors

@inproceedings{Fan2018BilingualEC,
  title={"Bilingual Expert" Can Find Translation Errors},
  author={Kai Fan and Bo Li and Fengming Zhou and Jiayi Wang},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2018}
}
  • Kai FanBo Li Jiayi Wang
  • Published in
    AAAI Conference on Artificial…
    25 July 2018
  • Computer Science
The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when the golden references are provided. However, in the case of model inference or production deployment, golden references are usually expensively available, such as human annotation with bilingual expertise. In order to address the issue of translation quality estimation (QE) without reference, we propose a general framework for automatic evaluation of the translation output for the QE task in the… 

Figures and Tables from this paper

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

Verdi is proposed, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora that beats the winner of the competition and outperforms other baseline methods by a great margin.

Select the Best Translation from Different Systems Without Reference

A new method of mixed MT metrics to automatically score the translation hypotheses from different systems with their references so as to construct the pseudo human-annotated data and a novel QE model based on Multi-BERT and Bi-RNN with a joint-encoding strategy is designed.

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

This work argues that there are still gaps between the predictor and the estimator in both data quality and training objectives, which preclude QE models from benefiting from a large number of parallel corpora more directly, and proposes a novel framework that provides a more accurate direct pretraining for QE tasks.

Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement

The results not only show the proposed dataset is more consistent with human judgment but also shows the effectiveness of the proposed tag correcting strategies.

Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation

This paper proposes an ensemble-based predictorestimator QE model with transfer learning to overcome such QE data scarcity challenge by leveraging QE scores from other miscellaneous languages and translation results of targeted languages.

Towards Making the Most of Pre-trained Translation Model for Quality Estimation

Conditional Masked Language Modeling (CMLM) and Denoising Restoration (DR) are proposed, which learn to predict masked tokens at the target side conditioned on the source sentence and can adapt the pre-trained translation model to the QE-style prediction task.

Target Oriented Data Generation for Quality Estimation of Machine Translation

This paper proposes an approach to generate pseudo QE training data by leveraging the provided labeled corpus in this task, and describes a sentence specific data expansion strategy to incrementally boost the model performance.

SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation

This work mainly explores the utilization of pre-trained translation models in QE and adopts a bi-directional translation-like strategy, similar to ELMo, but additionally conditions on source sentences.

Self-Supervised Quality Estimation for Machine Translation

This work proposes a self-supervised method for both sentence- and word-level QE, which performs quality estimation by recovering the masked target words, and shows that it outperforms previous unsupervised methods on several QE tasks in different language pairs and domains.

Practical Perspectives on Quality Estimation for Machine Translation

It is demonstrated that, while classical QE regression models fare poorly on this task, they can be re-purposed by replacing the output regression layer with a binary classification one, achieving 50-60\% recall at 90\% precision.
...

References

SHOWING 1-10 OF 28 REFERENCES

Ensembling Factored Neural Machine Translation Models for Automatic Post-Editing and Quality Estimation

This work presents a novel approach to Automatic Post-Editing (APE) and Word-Level Quality Estimation (QE) using ensembles of specialized Neural Machine Translation (NMT) systems, training a suite of NMT models that use different input representations, but share the same output space.

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.

Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation output

We investigate different strategies for combining quality estimation (QE) and automatic postediting (APE) to improve the output of machine translation (MT) systems. The joint contribution of the two

Neural Machine Translation of Rare Words with Subword Units

This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

Statistical Approaches to Computer-Assisted Translation

Alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems in a European project in two real tasks.

Pushing the Limits of Translation Quality Estimation

A new, carefully engineered, neural model is stacked into a rich feature-based word-level quality estimation system and the output of an automatic post-editing system is used as an extra feature, obtaining striking results on WMT16.

Exploiting Objective Annotations for Measuring Translation Post-editing Effort

It is shown that estimations resulting from using post-editing time, a simple and objective annotation, can reliably indicate translation post-EDiting effort in a practical, taskbased scenario.

Achieving Human Parity on Automatic Chinese to English News Translation

It is found that Microsoft's latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations.

A Study of Translation Edit Rate with Targeted Human Annotation

A new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments is examined, which indicates that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate withhuman judgments as well as—or better than—a second human judgment does.

Neural Post-Editing Based on Quality Estimation

This work finds that only a small number of edit operations are required for most machine translation outputs, through analysis of the training set of WMT17 APE en-de task, and can bring considerable relief from the overcorrection problem in APE.