Massively Multilingual Neural Machine Translation

  title={Massively Multilingual Neural Machine Translation},
  author={Roee Aharoni and Melvin Johnson and Orhan Firat},
Multilingual Neural Machine Translation enables training a single model that supports translation from multiple source languages into multiple target languages. We perform extensive experiments in training massively multilingual NMT models, involving up to 103 distinct languages and 204 translation directions simultaneously. We explore different setups for training such models and analyze the trade-offs between translation quality and various modeling decisions. We report results on the… 

Figures and Tables from this paper

Improving Multilingual Neural Machine Translation with Auxiliary Source Languages

This work proposes to improve multilingual translation in a more common scenario by exploiting synthetic source sentences from auxiliary languages by training the model on synthetic multi-source corpora and applying random masking to enable flexible inference with single-source or bi-source inputs.

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures.

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

This work sets a milestone by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples, and demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines.

Multilingual Neural Machine Translation

This tutorial will cover the latest advances in NMT approaches that leverage multilingualism, especially to enhance low-resource translation, and focus on the following topics: modeling parameter sharing for multi-way models, massively multilingual models, training protocols, language divergence, transfer learning, zero-shot/zero-resource learning, pivoting, multilingual pre-training and multi-source translation.

Distributionally Robust Multilingual Machine Translation

This paper proposes a new learning objective for MNMT based on distributionally robust optimization, which minimizes the worst-case expected loss over the set of language pairs and shows how to practically optimize this objective for large translation corpora using an iterated best response scheme.

Multilingual Simultaneous Neural Machine Translation

This paper proposes the multilingual approach to SIMT, where a single model simultaneously translates between multiple language-pairs, and results on translating from two Germanic languages and three Romance languages show the single multilingual model is on-par or better than individual models.

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Evaluating the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages shows gains in zero-shot transfer in 4 out of 5 tasks.

Multilingual Agreement for Multilingual Neural Machine Translation

This work proposes a novel agreement-based method to encourage multilingual agreement among different translation directions, which minimizes the differences among them.

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

A simple method to translate between thirteen languages using a single encoder and a single decoder is introduced, making use of multilingual data to improve UNMT for all language pairs and proposes two knowledge distillation methods to further enhance multilingual UNMT performance.

Balancing Training for Multilingual Neural Machine Translation

Experiments show the proposed method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.



Multilingual Neural Machine Translation With Soft Decoupled Encoding

This paper proposes Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data.

Multilingual Neural Machine Translation with Task-Specific Attention

This work proposes task-specific attention models, a simple but effective technique for improving the quality of sequence-to-sequence neural multilingual translation that seeks to retain as much of the parameter sharing generalization of NMT models as possible, while still allowing for language-specific specialization of the attention model to a particular language-pair or task.

Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder

This paper presents the first attempts in building a multilingual Neural Machine Translation framework under a unified approach in which the information shared among languages can be helpful in the translation of individual language pairs and points out a novel way to make use of monolingual data with Neural Machine translation.

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism

We propose multi-way, multilingual neural machine translation. The proposed approach enables a single neural translation model to translate between multiple languages, with a number of parameters

Rapid Adaptation of Neural Machine Translation to New Languages

This paper proposes methods based on starting with massively multilingual “seed models”, which can be trained ahead-of-time, and then continuing training on data related to the LRL, leading to a novel, simple, yet effective method of “similar-language regularization”.

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

This work examines parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model and finds that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family.

A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation

This work provides a quantitative and comparative analysis of the translations produced by bilingual, multilingual and zero- shot systems; investigates the translation quality of two of the currently dominant neural architectures in MT, which are the Recurrent and the Transformer ones; and quantitatively explores how the closeness between languages influences the zero-shot translation.

Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation

A simple framework for cross-lingual transfer learning by reusing the encoder from a multilingual NMT system and stitching it with a task-specific classifier component, which can perform classification in a new language for which no classification data was seen during training, showing that zero-shot classification is possible and remarkably competitive.