Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution

  title={Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution},
  author={Xavier Garc{\'i}a and Noah Constant and Ankur P. Parikh and Orhan Firat},
  booktitle={North American Chapter of the Association for Computational Linguistics},
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models, paving the way towards efficient continual learning for multilingual machine translation. Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts, incurs only minor degradation on the translation performance for the original language pairs and provides competitive performance even in the case where we only possess… 

Figures and Tables from this paper

Adapting Large Multilingual Machine Translation Models to Unseen Low Resource Languages via Vocabulary Substitution and Neuron Selection

A method to adapt large Multilingual Machine Translation models to a low resource language (LRL) that was not included during the pre-training/training phases is proposed and improves on both zero-shot and the stronger baseline of directly fine-tuning the model on the low-resource data.

Controlling Translation Formality Using Pre-trained Multilingual Language Models

Results show that this strategy can approach the translation quality and formality control achieved by dedicated translation models, however, the nature of the underlying pre-trained language model and of the finetuning samples greatly impact results.

Continual Learning in Multilingual NMT via Language-Specific Embeddings

This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the

Building Machine Translation Systems for the Next Thousand Languages

Results in three research domains are described, which include building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-drivenData-driven language identification techniques and developing practical MT models for under-served languages.

Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters

This work study the compositionality of language and domain adapters in the context of Machine Translation, and aims to study parameter-efficient adaptation to multiple domains and languages simultaneously and cross-lingual transfer in domains where parallel data is unavailable for certain language pairs.

Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages

It is shown that unsupervised sequence-segmentation performance can be transferred to extremely low-resource languages by pre-training a Masked Segmental Language Model (Downey et al., 2021) multilingually, and the multilingual pre-trained approach yields consistent segmentation quality across target dataset sizes.

On Robust Incremental Learning over Many Multilingual Steps

This work proposes a method for robust incremental learning over dozens of training steps using data from a variety of languages and shows that a combination of data-augmentation and an optimized training regime allows the model to continue improving the model even for as many as 10,000 training steps.

Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task

This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task and achieves BLEU scores of 17.0 and 30.4 for English to/from Livonian.

Training a T5 Using Lab-sized Resources

This paper presents var-ious techniques for making it possible to train a large language model using resources that a modest research lab might have, and train it in a reasonable amount of time.

Few-Shot Regularization to Tackle Catastrophic Forgetting in Multilingual Machine Translation

This work derives a new loss function that minimizes the forgetting of previously learned tasks by actively re-weighting past samples and penalizing weights that deviate too much from the original model.



From Bilingual to Multilingual Neural Machine Translation by Incremental Training

This work proposes a new training schedule that allows the system to scale to more languages without modification of the previous components based on joint training and language-independent encoder/decoder modules allowing for zero-shot translation.

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

This work sets a milestone by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples, and demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines.

Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary

We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language

Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism

We propose multi-way, multilingual neural machine translation. The proposed approach enables a single neural translation model to translate between multiple languages, with a number of parameters

Rapid Adaptation of Neural Machine Translation to New Languages

This paper proposes methods based on starting with massively multilingual “seed models”, which can be trained ahead-of-time, and then continuing training on data related to the LRL, leading to a novel, simple, yet effective method of “similar-language regularization”.

Massively Multilingual Neural Machine Translation

It is shown that massively multilingual many-to-many models are effective in low resource settings, outperforming the previous state-of-the-art while supporting up to 59 languages in 116 translation directions in a single model.

A Multilingual View of Unsupervised Machine Translation

A novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxiliary parallel data that contains the other, which can naturally be utilized in the probabilistic framework via a novel cross-translation loss term.

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.

Adapting Multilingual Neural Machine Translation to Unseen Languages

This work extensively explores data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions, and shows that dynamic adaptation of the model’s vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation.

Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT

The RE-LM approach, which reuses an LM that is pretrained only on the high-resource language, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian and English-Albanian, yielding more than +8.3 BLEU points for all four translation directions.