Corpus ID: 235651933

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

@article{Ma2021DeltaLMEP,
  title={DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders},
  author={Shuming Ma and Li Dong and Shaohan Huang and Dongdong Zhang and Alexandre Muzio and Saksham Singhal and Hany Hassan Awadalla and Xia Song and Furu Wei},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.13736}
}
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework, where the pretrained encoders can only benefit part of it. To reduce this gap, we introduce DeltaLM (∆LM), a pretrained multilingual encoderdecoder model that regards the decoder as the task layer of off-the-shelf pretrained encoders. Specifically… Expand

Figures and Tables from this paper

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
TLDR
Experimental results show that the proposed MT6 improves cross-lingual transferability over MT5, and proposes a partially nonautoregressive objective for text-to-text pretraining. Expand
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
TLDR
This paper introduces ELECTRA-style tasks and pretrain the model, named as XLM-E, on both multilingual and parallel corpora, and shows that the model outperforms the baseline models on various cross-lingual understanding tasks with much less computation cost. Expand
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
  • Bo Zheng, Li Dong, +5 authors Furu Wei
  • Computer Science
  • 2021
Compared to monolingual models, crosslingual models usually require a more expressive vocabulary to represent all languages adequately. We find that many languages are under-represented in recentExpand

References

SHOWING 1-10 OF 34 REFERENCES
Cross-Lingual Natural Language Generation via Pre-Training
TLDR
Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data. Expand
MASS: Masked Sequence to Sequence Pre-training for Language Generation
TLDR
This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model. Expand
Multilingual Denoising Pre-training for Neural Machine Translation
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—aExpand
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
TLDR
It is found that doing fine-tuning on multiple languages together can bring further improvement in Unicoder, a universal language encoder that is insensitive to different languages. Expand
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
TLDR
This work presents XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and finetunes it with multilingual parallel data and explains its effectiveness for machine translation. Expand
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Expand
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
TLDR
Experimental results show that the proposed MT6 improves cross-lingual transferability over MT5, and proposes a partially nonautoregressive objective for text-to-text pretraining. Expand
VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation
TLDR
A variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre- training tasks, which delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark. Expand
Multi-task Learning for Multilingual Neural Machine Translation
TLDR
This work proposes a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data, and shows the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks. Expand
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
TLDR
This work shows that multilingual translation models can be created through multilingual finetuning, and demonstrates that pretrained models can been extended to incorporate additional languages without loss of performance. Expand
...
1
2
3
4
...