Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
@inproceedings{Chen2021ZeroShotCT, title={Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders}, author={Guanhua Chen and Shuming Ma and Yun Chen and Li Dong and Dongdong Zhang and Jianxiong Pan and Wenping Wang and Furu Wei}, booktitle={EMNLP}, year={2021} }
Previous work mainly focuses on improving cross-lingual transfer for NLU tasks with a multilingual pretrained encoder (MPE), or improving the performance on supervised machine translation with BERT. However, it is under-explored that whether the MPE can help to facilitate the cross-lingual transferability of NMT model. In this paper, we focus on a zero-shot cross-lingual transfer task in NMT. In this task, the NMT model is trained with parallel dataset of only one language pair and an off-the…
Figures and Tables from this paper
6 Citations
Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation
- Computer ScienceACL
- 2022
SixT+ is presented, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages, and offers a set of model parameters that can be further fine-tuned to other unsupervised tasks.
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
- Computer ScienceArXiv
- 2021
SixT+, a strong many-to-English NMT model that supports 100 source languages but is trained with a parallel dataset in only six source languages, is presented, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Zero-shot Cross-lingual Conversational Semantic Role Labeling
- Computer ScienceArXiv
- 2022
The usefulness of CSRL to non-Chinese conversational tasks such as the question-in-context rewriting task in English and the multi-turn dialogue response generation tasks in English, German and Japanese is improved by incorporating the CSRL information into the downstream conversation-based models.
Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model
- Computer ScienceArXiv
- 2022
This work proposes a simple refinement procedure to disentangle languages from a pre-trained multilingual UMT model for it to focus on only the target low-resource task.
Language Models are Few-shot Multilingual Learners
- Computer Science, LinguisticsMRL
- 2021
It is shown that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are competitive compared to the existing state-of-the-art cross-lingual models and translation models.
OCR Improves Machine Translation for Low-Resource Languages
- Computer ScienceFINDINGS
- 2022
It is shown that OCR monolingual data is a valuable resource that can increase performance of Machine Translation models, when used in backtranslation, and what is the minimum level of OCR quality needed for the monolingUAL data to be useful for Machine Translation.
References
SHOWING 1-10 OF 48 REFERENCES
Cross-Lingual Natural Language Generation via Pre-Training
- Computer ScienceAAAI
- 2020
Experimental results on question generation and abstractive summarization show that the model outperforms the machine-translation-based pipeline methods for zero-shot cross-lingual generation and improves NLG performance of low-resource languages by leveraging rich-resource language data.
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
- Computer ScienceACL
- 2020
It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures.
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation
- Computer ScienceEACL
- 2021
The benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT), are investigated.
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
- Computer ScienceArXiv
- 2020
This work presents XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and finetunes it with multilingual parallel data and explains its effectiveness for machine translation.
Multilingual Denoising Pre-training for Neural Machine Translation
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2020
Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a…
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information
- Computer ScienceEMNLP
- 2020
It is the first time to verify that multiple low-resource language pairs can be utilized to improve rich resource MT, and mRASP is even able to improve the translation quality on exotic languages that never occur in the pre-training corpus.
Cross-lingual Retrieval for Iterative Self-Supervised Training
- Computer ScienceNeurIPS
- 2020
This work found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs, and developed a new approach -- cross- Lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingsual alignment and translation ability at the same time.
Improving Zero-Shot Translation by Disentangling Positional Information
- Computer ScienceACL
- 2021
By thorough inspections of the hidden layer outputs, it is shown that the proposed approach indeed leads to more language-independent representations, and allows easy integration of new languages, which substantially expands translation coverage.
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation
- Computer ScienceAAAI
- 2020
An Apt framework for acquiring knowledge from pre-trained model to NMT, which includes a dynamic fusion mechanism to fuse task-specific features adapted from general knowledge into NMT network and a knowledge distillation paradigm to learn language knowledge continuously during the NMT training process.
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
- Computer ScienceTACL
- 2017
This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.