Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution
@inproceedings{Garca2021TowardsCL, title={Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution}, author={Xavier Garc{\'i}a and Noah Constant and Ankur P. Parikh and Orhan Firat}, booktitle={North American Chapter of the Association for Computational Linguistics}, year={2021} }
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models, paving the way towards efficient continual learning for multilingual machine translation. Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts, incurs only minor degradation on the translation performance for the original language pairs and provides competitive performance even in the case where we only possess…
Figures and Tables from this paper
17 Citations
Adapting Large Multilingual Machine Translation Models to Unseen Low Resource Languages via Vocabulary Substitution and Neuron Selection
- Computer ScienceAMTA
- 2022
A method to adapt large Multilingual Machine Translation models to a low resource language (LRL) that was not included during the pre-training/training phases is proposed and improves on both zero-shot and the stronger baseline of directly fine-tuning the model on the low-resource data.
Controlling Translation Formality Using Pre-trained Multilingual Language Models
- Linguistics, Computer ScienceIWSLT
- 2022
Results show that this strategy can approach the translation quality and formality control achieved by dedicated translation models, however, the nature of the underlying pre-trained language model and of the finetuning samples greatly impact results.
Continual Learning in Multilingual NMT via Language-Specific Embeddings
- Linguistics, Computer ScienceWMT
- 2021
This paper proposes a technique for adding a new source or target language to an existing multilingual NMT model without re-training it on the initial set of languages. It consists in replacing the…
Building Machine Translation Systems for the Next Thousand Languages
- Computer ScienceArXiv
- 2022
Results in three research domains are described, which include building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-drivenData-driven language identification techniques and developing practical MT models for under-served languages.
Parameter-Efficient Finetuning for Robust Continual Multilingual Learning
- Computer ScienceArXiv
- 2022
The proposed pipeline, LAFT-URIEL, improves the spread of gains over the supported languages while reducing the magnitude of language-specific losses incurred, and develops novel netuning strategies that allow us to jointly minimize language- Speci-Speci forgetting while encouraging positive cross-lingual transfer observed in this setup.
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters
- Computer ScienceWMT
- 2021
This work study the compositionality of language and domain adapters in the context of Machine Translation, and aims to study parameter-efficient adaptation to multiple domains and languages simultaneously and cross-lingual transfer in domains where parallel data is unavailable for certain language pairs.
Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages
- Computer ScienceACL
- 2022
It is shown that unsupervised sequence-segmentation performance can be transferred to extremely low-resource languages by pre-training a Masked Segmental Language Model (Downey et al., 2021) multilingually, and the multilingual pre-trained approach yields consistent segmentation quality across target dataset sizes.
On Robust Incremental Learning over Many Multilingual Steps
- Computer ScienceArXiv
- 2022
This work proposes a method for robust incremental learning over dozens of training steps using data from a variety of languages and shows that a combination of data-augmentation and an optimized training regime allows the model to continue improving the model even for as many as 10,000 training steps.
Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation System for the WMT22 Translation Task
- Computer ScienceArXiv
- 2022
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task and achieves BLEU scores of 17.0 and 30.4 for English to/from Livonian.
Training a T5 Using Lab-sized Resources
- Computer ScienceArXiv
- 2022
This paper presents var-ious techniques for making it possible to train a large language model using resources that a modest research lab might have, and train it in a reasonable amount of time.
References
SHOWING 1-10 OF 38 REFERENCES
From Bilingual to Multilingual Neural Machine Translation by Incremental Training
- Computer ScienceACL
- 2019
This work proposes a new training schedule that allows the system to scale to more languages without modification of the previous components based on joint training and language-independent encoder/decoder modules allowing for zero-shot translation.
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
- Computer ScienceArXiv
- 2019
This work sets a milestone by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples, and demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines.
Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary
- Computer ScienceIWSLT
- 2018
We propose a method to transfer knowledge across neural machine translation (NMT) models by means of a shared dynamic vocabulary. Our approach allows to extend an initial model for a given language…
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
- Computer ScienceNAACL
- 2016
We propose multi-way, multilingual neural machine translation. The proposed approach enables a single neural translation model to translate between multiple languages, with a number of parameters…
Rapid Adaptation of Neural Machine Translation to New Languages
- Computer ScienceEMNLP
- 2018
This paper proposes methods based on starting with massively multilingual “seed models”, which can be trained ahead-of-time, and then continuing training on data related to the LRL, leading to a novel, simple, yet effective method of “similar-language regularization”.
Massively Multilingual Neural Machine Translation
- Computer ScienceNAACL
- 2019
It is shown that massively multilingual many-to-many models are effective in low resource settings, outperforming the previous state-of-the-art while supporting up to 59 languages in 116 translation directions in a single model.
A Multilingual View of Unsupervised Machine Translation
- Computer ScienceFINDINGS
- 2020
A novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxiliary parallel data that contains the other, which can naturally be utilized in the probabilistic framework via a novel cross-translation loss term.
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
- Computer ScienceTACL
- 2017
This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.
Adapting Multilingual Neural Machine Translation to Unseen Languages
- Computer ScienceIWSLT
- 2019
This work extensively explores data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions, and shows that dynamic adaptation of the model’s vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation.
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT
- Computer ScienceEMNLP
- 2020
The RE-LM approach, which reuses an LM that is pretrained only on the high-resource language, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian and English-Albanian, yielding more than +8.3 BLEU points for all four translation directions.