Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models

  title={Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models},
  author={Chris Hokamp and John Glover and Demian Gholipour Ghalandari},
We study several methods for full or partial sharing of the decoder parameters of multi-lingual NMT models. Using only the WMT 2019 shared task parallel datasets for training, we evaluate both fully supervised and zero-shot translation performance in 110 unique translation directions. We use additional test sets and re-purpose evaluation methods recently used for unsupervised MT in order to evaluate zero-shot translation performance for language pairs where no gold-standard parallel data is… Expand
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
This work sets a milestone by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples, and demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines. Expand
A Comprehensive Survey of Multilingual Neural Machine Translation
An in-depth survey of existing literature on multilingual neural machine translation is presented, which categorizes various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues and challenges. Expand
A Survey of Multilingual Neural Machine Translation
An in-depth survey of existing literature on multilingual neural machine translation (MNMT) is presented and various approaches are categorized based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues, and challenges. Expand
Bridging Philippine Languages With Multilingual Neural Machine Translation
The Philippines is home to more than 150 languages that is considered to be low-resourced even on its major languages. This results into a lack of pursuit in developing a translation system for theExpand
Neural Machine Translation for Low-Resource Languages: A Survey
A detailed survey of research advancements in low-resource language NMT (LRL-NMT), along with a quantitative analysis aimed at identifying the most popular solutions, and a set of guidelines to select the possible NMT technique for a given LRL data setting. Expand
Survey of Low-Resource Machine Translation
We present a survey covering the state of the art in low-resource machine translation. There are currently around 7000 languages spoken in the world and almost all language pairs lack significantExpand
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for anyExpand
Task Selection Policies for Multitask Learning
This work provides an empirical evaluation of the performance of some common task selection policies in a synthetic bandit-style setting and on the GLUE benchmark for natural language understanding, and suggests a method based on counterfactual estimation that leads to improved model performance in experimental settings. Expand


Consistency by Agreement in Zero-Shot Neural Machine Translation
This paper reformulates multilingual translation as probabilistic inference as well as defining the notion of zero-shot consistency and introducing a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. Expand
Zero-Shot Dual Machine Translation
Experiments show that a zero-shot dual system, trained on English-French and English-Spanish, outperforms by large margins a standard NMT system in zero- shot translation performance on Spanish-French (both directions). Expand
The Missing Ingredient in Zero-Shot Neural Machine Translation
This paper first diagnoses why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs, and proposes auxiliary losses on the NMT encoder that impose representational invariance across languages. Expand
Multilingual Neural Machine Translation with Task-Specific Attention
This work proposes task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation that seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for language-specific specialization of the attention model to a particular language-pair or task. Expand
Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations
This work addresses the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences and proposes two simple but effective approaches: decoder pre-training; back-translation. Expand
Phrase-Based & Neural Unsupervised Machine Translation
This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. Expand
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language. Expand
Parameter Sharing Methods for Multilingual Self-Attentional Translation Models
This work examines parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model and finds that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. Expand
Unsupervised Machine Translation Using Monolingual Corpora Only
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data. Expand
Unsupervised Neural Machine Translation
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation. Expand