Gradient-guided Loss Masking for Neural Machine Translation
@article{Wang2021GradientguidedLM, title={Gradient-guided Loss Masking for Neural Machine Translation}, author={Xinyi Wang and Ankur Bapna and Melvin Johnson and Orhan Firat}, journal={ArXiv}, year={2021}, volume={abs/2102.13549} }
To mitigate the negative effect of low quality training data on the performance of neural machine translation models, most existing strategies focus on filtering out harmful data before training starts. In this paper, we explore strategies that dynamically optimize data usage during the training process using the model’s gradients on a small set of clean data. At each training step, our algorithm calculates the gradient alignment between the training data and the clean data to mask out data…
7 Citations
Data Selection Curriculum for Neural Machine Translation
- Computer ScienceArXiv
- 2022
This work in-troduce a two-stage curriculum training framework for NMT where a base NMT model is tuned on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging N MT model.
Improving Multilingual Translation by Representation and Gradient Regularization
- Computer ScienceEMNLP
- 2021
This work proposes a joint approach to regularize NMT models at both representation-level and gradient-level, and demonstrates that this approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer
- Computer ScienceNAACL-HLT
- 2022
This paper proposes a one-step mixed training method that trains on both source and target data with stochastic gradient surgery, a novel gradient-level optimization, and achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin.
The Trade-offs of Domain Adaptation for Neural Language Models
- Computer ScienceACL
- 2022
This work presents how adaptation techniques based on data selection, such as importance sampling, intelligent data selection and influence functions, can be presented in a common framework which highlights their similarity and also their subtle differences.
On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation
- Computer ScienceArXiv
- 2021
This work assesses the complementarity of selection with fine tuning and results in practical mendations that data selection from domain classifiers is often more effec- 021 tive than the popular contrastive data selection 022 method.
Influence Functions for Sequence Tagging Models
- Computer ScienceArXiv
- 2022
The practical utility of segment influence is shown by using the method to identify systematic annotation errors in two named entity recognition corpora and measuring the effect that perturbing the labels within this segment has on a test segment level prediction.
Switchable Representation Learning Framework with Self-compatibility
- Computer ScienceArXiv
- 2022
This work proposes a S witchable representation learning F ramework with S elf- C ompatibility (SFSC), which generates a series of compatible sub-models with different capacities through one training process and achieves state-of-art performance on the evaluated dataset.
References
SHOWING 1-10 OF 14 REFERENCES
Dynamic Data Selection for Neural Machine Translation
- Computer ScienceEMNLP
- 2017
This paper introduces ‘dynamic data selection’ for NMT, a method in which the selected subset of training data is varied between different training epochs, and shows that the best results are achieved when applying a technique called ‘gradual fine-tuning’.
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection
- Computer ScienceWMT
- 2018
Methods for measuring and selecting data for domain MT and applies them to denoising NMT training show its significant effectiveness for NMT to train on data with severe noise.
Balancing Training for Multilingual Neural Machine Translation
- Computer ScienceACL
- 2020
Experiments show the proposed method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.
On the Impact of Various Types of Noise on Neural Machine Translation
- Computer ScienceNMT@ACL
- 2018
It is found that neural models are generally more harmed by noise than statistical models, and for one especially egregious type of noise they learn to just copy the input sentence.
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
- Computer ScienceWMT
- 2018
This work introduces dual conditional cross-entropy filtering for noisy parallel data and achieves higher BLEU scores with models trained on parallel data filtered only from Paracrawl than with models training on clean WMT data.
Attention is All you Need
- Computer ScienceNIPS
- 2017
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
A Call for Clarity in Reporting BLEU Scores
- Computer ScienceWMT
- 2018
Pointing to the success of the parsing community, it is suggested machine translation researchers settle upon the BLEU scheme, which does not allow for user-supplied reference processing, and provide a new tool, SACREBLEU, to facilitate this.
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
- Computer ScienceEMNLP
- 2018
SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.
Intelligent Selection of Language Model Training Data
- Computer ScienceACL
- 2010
We address the problem of selecting non-domain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on…
Learning to Reweight Examples for Robust Deep Learning
- Computer ScienceICML
- 2018
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.