Multilingual Denoising Pre-training for Neural Machine Translation

@article{Liu2020MultilingualDP,
  title={Multilingual Denoising Pre-training for Neural Machine Translation},
  author={Yinhan Liu and Jiatao Gu and Naman Goyal and Xian Li and Sergey Edunov and Marjan Ghazvininejad and Mike Lewis and Luke Zettlemoyer},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  volume={8},
  pages={726-742}
}
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous… Expand
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
TLDR
The benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT), are investigated. Expand
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
  • Guanhua Chen, Shuming Ma, +4 authors Furu Wei
  • Computer Science
  • ArXiv
  • 2021
TLDR
SixT++ is a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages, and outperforms all current state-of-the-art unsupervised methods on Nepali and Sinhal for both translating into and from English. Expand
Multilingual Translation from Denoising Pre-Training
TLDR
It is found that multilingual finetuning can significantly improve over multilingual models trained from scratch for zero-shot translation on non-English directions and the ML50 benchmark is created to facilitate reproducible research by standardizing training and evaluation data. Expand
Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation
TLDR
A continual pre-training (CPT) framework on mBART to effectively adapt it to unseen languages and can consistently improve the finetuning performance upon the mBart baseline, as well as other strong baselines, across all tested low-resource translation pairs containing unseen languages. Expand
Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation
TLDR
The recipe to achieve cross-modal and cross-lingual transfer learning (XMTL) is simple and generalizable: using an adaptor module to bridge the modules pretrained in different modalities, and an efficient finetuning step which leverages the knowledge from pretrained modules yet making it work on a drastically different downstream task. Expand
Multi-task Learning for Multilingual Neural Machine Translation
TLDR
This work proposes a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data, and shows the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks. Expand
Self-supervised and Supervised Joint Training for Resource-rich Machine Translation
TLDR
A joint training approach to combine self-supervised and supervised learning to optimize NMT models, F2-XEnDec, which achieves substantial improvements over several strong baseline methods and obtains a new state of the art of 46.19 BLEU on English-French when incorporating back translation. Expand
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
TLDR
This paper proposes SixT, a simple yet effective model that significantly outperforms mBART, a pretrained multilingual encoderdecoder model explicitly designed for NMT, with an average improvement of 7.1 BLEU on zero-shot any-to-English test sets across 14 source languages. Expand
Multilingual Translation via Grafting Pre-trained Language Models
TLDR
Graformer is proposed to graft separately pre-trained (masked) language models for machine translation with monolingual data for pre-training and parallel data for grafting training to maximally take advantage of the usage of both types of data. Expand
DEEP: DEnoising Entity Pre-training for Neural Machine Translation
  • Junjie Hu, Hiroaki Hayashi, Kyunghyun Cho, Graham Neubig
  • Computer Science
  • ArXiv
  • 2021
TLDR
DEEP, a DEnoising Entity Pretraining method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences, is proposed and Experimental results demonstrate that DEEP results in significant improvements over strong denoising auto-encoding baselines. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 118 REFERENCES
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
TLDR
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks. Expand
Improving Neural Machine Translation Models with Monolingual Data
TLDR
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English. Expand
Unsupervised Neural Machine Translation
TLDR
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation. Expand
Pretrained Language Models for Document-Level Neural Machine Translation
TLDR
This paper investigates using large contexts with three main contributions: different from previous work which pertrained models on large-scale sentence-level parallel corpora, BERT, which are trained on monolingual documents, and proposes context manipulation methods to control the influence of large contexts. Expand
Unsupervised Pretraining for Sequence to Sequence Learning
TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data. Expand
MASS: Masked Sequence to Sequence Pre-training for Language Generation
TLDR
This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model. Expand
Phrase-Based & Neural Unsupervised Machine Translation
TLDR
This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. Expand
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation
TLDR
The proposed extract-edit approach to extract and then edit real sentences from the target monolingual corpora consistently outperforms the previous state-of-the-art unsupervised machine translation systems across two benchmarks and two low-resource language pairs. Expand
Unsupervised Machine Translation Using Monolingual Corpora Only
TLDR
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data. Expand
Pre-trained language model representations for language generation
TLDR
This paper examines different strategies to integrate pre-trained representations into sequence to sequence models and applies it to neural machine translation and abstractive summarization and finds that pre- trained representations are most effective when added to the encoder network which slows inference by only 14%. Expand
...
1
2
3
4
5
...