• Corpus ID: 201671351

Learning a Multitask Curriculum for Neural Machine Translation

  title={Learning a Multitask Curriculum for Neural Machine Translation},
  author={Wei Wang and Ye Tian and Jiquan Ngiam and Yinfei Yang and Isaac Caswell and Zarana Parekh},
Existing curriculum learning research in neural machine translation (NMT) mostly focuses on a single final task such as selecting data for a domain or for denoising, and considers in-task example selection. This paper studies the data selection problem in multitask setting. We present a method to learn a multitask curriculum on a single, diverse, potentially noisy training dataset. It computes multiple data selection scores for each training example, each score measuring how useful the example… 
Token-wise Curriculum Learning for Neural Machine Translation
A novel token-wise curriculum learning approach that creates sufficient amounts of easy samples for lowresource languages by learning to predict a short sub-sequence from the beginning part of each target sentence at the early stage of training.
Self-Induced Curriculum Learning in Neural Machine Translation
An in-depth analysis of the sampling choices the SS-NMT model takes during training is provided, showing that, without it having been told to do so, the model selects samples of increasing complexity and task-relevance in combination with a denoising curriculum.


Reinforcement Learning based Curriculum Optimization for Neural Machine Translation
This work uses reinforcement learning to learn a curriculum automatically, jointly with the NMT system, in the course of a single training run, and shows that this approach can beat uniform baselines on Paracrawl and WMT English-to-French datasets.
An Empirical Exploration of Curriculum Learning for Neural Machine Translation
A probabilistic view of curriculum learning is adopted, which lets us flexibly evaluate the impact of curricula design, and an extensive exploration on a German-English translation task shows it is possible to improve convergence time at no loss in translation quality.
Competence-based Curriculum Learning for Neural Machine Translation
A curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance, which can help improve the training time and the performance of both recurrent neural network models and Transformers.
Curriculum Learning for Domain Adaptation in Neural Machine Translation
This work introduces a curriculum learning approach to adapt generic neural machine translation models to a specific domain and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.
Multi-Domain Neural Machine Translation through Unsupervised Adaptation
This work explores an efficient instance-based adaptation method that, by exploiting the similarity between the training instances and each test sentence, dynamically sets the hyperparameters of the learning algorithm and updates the generic model on-the-fly.
Effective Domain Mixing for Neural Machine Translation
This work shows that training NMT systems on naively mixed data can degrade performance versus models fit to each constituent domain, and proposes three models that do so by jointly learning domain discrimination and translation.
Neural Machine Translation Training in a Multi-Domain Scenario
The findings on Arabic-English and German-English language pairs show that the best translation quality can be achieved by building an initial system on a concatenation of available out-of-domain data and then fine-tuning it on in-domainData selection, model stacking, and weighted ensemble did not give the best results.
Curriculum Learning and Minibatch Bucketing in Neural Machine Translation
This work examines the effects of particular orderings of sentence pairs on the on-line training of neural machine translation (NMT) and focuses on ensuring that each minibatch contains sentences similar in some aspect and gradual inclusion of some sentence types as the training progresses.
Dynamic Data Selection for Neural Machine Translation
This paper introduces ‘dynamic data selection’ for NMT, a method in which the selected subset of training data is varied between different training epochs, and shows that the best results are achieved when applying a technique called ‘gradual fine-tuning’.
Domain Adaptation via Pseudo In-Domain Data Selection
The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding.