Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

  title={Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing},
  author={Brian Thompson and Matt Post},
We propose the use of a sequence-to-sequence paraphraser for automatic machine translation evaluation. The paraphraser takes a human reference as input and then force-decodes and scores an MT system output. We propose training the aforementioned paraphraser as a multilingual NMT system, treating paraphrasing as a zero-shot "language pair" (e.g., Russian to Russian). We denote our paraphraser "unbiased" because the mode of our model's output probability is centered around a copy of the input… 
Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity
A simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input and which produces paraphrases that better preserve meaning and are more gramatical, for the same level of lexical diversity.
Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
This paper describes its contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation, and makes several submissions based on BLEURT, a previously published which uses transfer learning.
BARTScore: Evaluating Generated Text as Text Generation
This work conceptualizes the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models, and proposes a metric BARTS CORE with a number of variants that can be flexibly applied in an unsupervised fashion to evaluation of text from different perspectives.
MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification
This work introduces MaskEval, a reference-less metric for text summarization and simplification that is adapted to evaluate different quality dimensions by performing masked language modeling on the concatenation of the candidate and the source texts by featuring an attention-like weighting mechanism to mod-ulate the relative importance of each MLM step.
NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures
Compared to baselines such as sentence embeddings, translation-based measures prove competitive in paraphrase identification and are more ro-bust against adversarial or multilingual input, especially if proper normalization is applied.
Onception: Active Learning with Expert Advice for Real World Machine Translation
This paper assumes a real world human-in-the-loop scenario in which the source sentences may not be readily available, but instead arrive in a stream, and proposes to dynamically combine multiple strategies using prediction with expert advice.
Are References Really Needed? Unbabel-IST 2021 Submission for the Metrics Shared Task
The joint contribution of Unbabel and IST to the WMT 2021 Metrics Shared Task is presented and it is shown that reference-free COMET models are becoming competitive with reference-based models, even outperforming the best COMET model from 2020 on this year’s development data.
Uncertainty-Aware Machine Translation Evaluation
This paper combines the C OMET framework with two uncertainty estimation methods, Monte Carlo dropout and deep ensembles, to obtain quality scores along with confidence intervals and analyzes the trustworthiness of the predicted quality.
Results of the WMT20 Metrics Shared Task
An extensive analysis on influence of different reference translations on metric reliability, how well automatic metrics score human translations, and major discrepancies between metric and human scores when evaluating MT systems are presented.
UniTE: Unified Translation Evaluation
This paper proposes monotonic regional attention to control the interaction among input segments, and unified pretraining to better adapt multi-task training and can universally surpass various state-of-the-art or winner methods across tasks.


Machine Translation Evaluation using Bi-directional Entailment
A new metric for Machine Translation evaluation, based on bi-directional entailment, using BERT's pre-trained implementation of transformer networks, fine-tuned on MNLI corpus, for natural language inferencing and finds that this metric has a better correlation with the human annotated score compared to the other traditional metrics at system level.
Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases
Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks
  • J. Tiedemann, Yves Scherrer
  • Computer Science, Linguistics
    Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for
  • 2019
The results show that the perplexity of multilingual neural translation models when applied to paraphrases of the source language is significantly reduced in each of the cases, indicating that meaning can be grounded in translation.
ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations
This work uses ParaNMT-50M, a dataset of more than 50 million English-English sentential paraphrase pairs, to train paraphrastic sentence embeddings that outperform all supervised systems on every SemEval semantic textual similarity competition, in addition to showing how it can be used for paraphrase generation.
Improving Zero-shot Translation with Language-Independent Constraints
This work intentionally creates an encoder architecture which is independent with respect to the source language, and designs regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions.
ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation
ParaBank is presented, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality and is used to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.
Filtering Pseudo-References by Paraphrasing for Automatic Evaluation of Machine Translation
The experimental results of the WMT 2016 and 2017 datasets show that the proposed method achieved higher correlation with human evaluation than the sentence BLEU (SentBLEU) baselines with a single reference and with unfiltered pseudo-references.
Analyzing Uncertainty in Neural Machine Translation
This study proposes tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations and shows that search works remarkably well but that models tend to spread too much probability mass over the hypothesis space.
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.