Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

@inproceedings{Gu2019ImprovedZN,
  title={Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations},
  author={Jiatao Gu and Yong Wang and Kyunghyun Cho and V. Li},
  booktitle={ACL},
  year={2019}
}
Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem… Expand
Self-Learning for Zero Shot Neural Machine Translation
TLDR
This work proposes a novel zero-shot NMT modeling approach that learns without the now-standard assumption of a pivot language sharing parallel data with the zero- shot source and target languages, and shows consistent improvements even in a domain-mismatch setting. Expand
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
TLDR
It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. Expand
Language Tags Matter for Zero-Shot Neural Machine Translation
TLDR
It is demonstrated that a proper LT strategy could enhance the consistency of semantic representations and alleviate the off-target issue in zero-shot directions. Expand
Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation
TLDR
It is found that language-specific subword segmentation results in less subword copying at training time, and leads to better zero-shot performance compared to jointly trained segmentation, and this bias towards English can be effectively reduced with even a small amount of parallel data in some of the non-English pairs. Expand
Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation
TLDR
This work proposes an effective transfer learning approach based on cross-lingual pre-training to enable a smooth transition for zero-shot translation and significantly outperforms strong pivot-based baseline and various multilingual NMT approaches. Expand
Zero-Shot Paraphrase Generation with Multilingual Language Models
TLDR
This paper proposes a simple and unified paraphrasing model, which is purely trained on multilingual parallel data and can conduct zero-shot paraphrase generation in one step and surpasses the pivoting method in terms of relevance, diversity, fluency and efficiency. Expand
Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models
TLDR
An in-depth evaluation of the translation performance of different models, highlighting the trade-offs between methods of sharing decoder parameters, finds that models which have task-specificDecoder parameters outperform models where decoder parameter are fully shared across all tasks. Expand
Iterative Multilingual Neural Machine Translation for Less-Common and Zero-Resource Language Pairs
TLDR
A simple iterative traininggenerating-filtering-training process that utilizes all available pivot parallel data to generate synthetic data for unseen directions and a filtering method based on word alignments and the longest parallel phrase to filter out noise sentence pairs in the synthetic data is proposed. Expand
Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders
TLDR
This work proposes to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua, and introduces a denoising auto-encoding (DAE) objective to jointly train the model with the translation task in a multi-task manner. Expand
Language Models are Good Translators
TLDR
It is demonstrated that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 32 REFERENCES
The Missing Ingredient in Zero-Shot Neural Machine Translation
TLDR
This paper first diagnoses why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs, and proposes auxiliary losses on the NMT encoder that impose representational invariance across languages. Expand
Zero-Shot Dual Machine Translation
TLDR
Experiments show that a zero-shot dual system, trained on English-French and English-Spanish, outperforms by large margins a standard NMT system in zero- shot translation performance on Spanish-French (both directions). Expand
Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation
TLDR
An approach to zero-resource NMT via maximum expected likelihood estimation is proposed, to maximize the expectation with respect to a pivot-to-source translation model for the intended source- to-target model on a pivotto-source parallel corpus. Expand
Phrase-Based & Neural Unsupervised Machine Translation
TLDR
This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. Expand
Unsupervised Machine Translation Using Monolingual Corpora Only
TLDR
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data. Expand
Unsupervised Neural Machine Translation
TLDR
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation. Expand
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
TLDR
This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language. Expand
Contextual Parameter Generation for Universal Neural Machine Translation
TLDR
This approach requires no changes to the model architecture of a standard NMT system, but instead introduces a new component, the contextual parameter generator (CPG), that generates the parameters of the system (e.g., weights in a neural network). Expand
Neural Machine Translation with Pivot Languages
TLDR
This work introduces a joint training algorithm for pivot-based neural machine translation and proposes three methods to connect the two models and enable them to interact with each other during training. Expand
Transfer Learning for Low-Resource Neural Machine Translation
TLDR
A transfer learning method is presented that significantly improves Bleu scores across a range of low-resource languages by first train a high-resource language pair, then transfer some of the learned parameters to the low- resource pair to initialize and constrain training. Expand
...
1
2
3
4
...