Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

@inproceedings{Chen2021ZeroShotCT,
  title={Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders},
  author={Guanhua Chen and Shuming Ma and Yun Chen and Li Dong and Dongdong Zhang and Jianxiong Pan and Wenping Wang and Furu Wei},
  booktitle={EMNLP},
  year={2021}
}
Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the… 
Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation
TLDR
SixT+ is presented, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages, and offers a set of model parameters that can be further fine-tuned to other unsupervised tasks.
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
TLDR
SixT+, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages, is presented, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Deep Fusing Pre-trained Models into Neural Machine Translation
TLDR
A novel framework to deep fuse the pre-trained representation into NMT, fully exploring the potential of PTMs in NMT is proposed and outperforms previous work in both autoregressive and non-autoregressive NMT models.
Zero-shot Cross-lingual Conversational Semantic Role Labeling
TLDR
The usefulness of CSRL to non-Chinese conversational tasks such as the question-in-context rewriting task in English and the multi-turn dialogue response generation tasks in English, German and Japanese is improved by incorporating the CSRL information into the downstream conversation-based models.
Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model
TLDR
This work proposes a simple refinement procedure to disentangle languages from a pre-trained multilingual UMT model for it to focus on only the target low-resource task.
Language Models are Few-shot Multilingual Learners
TLDR
It is shown that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are competitive compared to the existing state-of-the-art cross-lingual models and translation models.
OCR Improves Machine Translation for Low-Resource Languages
TLDR
It is shown that OCR monolingual data is a valuable resource that can increase performance of Machine Translation models, when used in backtranslation, and what is the minimum level of OCR quality needed for the monolingUAL data to be useful for Machine Translation.

References

SHOWING 1-10 OF 48 REFERENCES
Cross-Lingual Natural Language Generation via Pre-Training
TLDR
Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data.
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
TLDR
It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures.
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
TLDR
The benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT), are investigated.
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
TLDR
This work presents XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and finetunes it with multilingual parallel data and explains its effectiveness for machine translation.
Multilingual Denoising Pre-training for Neural Machine Translation
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
TLDR
It is the first time to verify that multiple low-resource language pairs can be utilized to improve rich resource MT, and mRASP is even able to improve the translation quality on exotic languages that never occur in the pre-training corpus.
Cross-lingual Retrieval for Iterative Self-Supervised Training
TLDR
This work found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs, and developed a new approach -- cross- Lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingsual alignment and translation ability at the same time.
Improving Zero-Shot Translation by Disentangling Positional Information
TLDR
By thorough inspections of the hidden layer outputs, it is shown that the proposed approach indeed leads to more language-independent representations, and allows easy integration of new languages, which substantially expands translation coverage.
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation
TLDR
An Apt framework for acquiring knowledge from pre-trained model to NMT, which includes a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network and a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process.
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
TLDR
This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.
...
...