Non-Parametric Adaptation for Neural Machine Translation

@inproceedings{Bapna2019NonParametricAF,
  title={Non-Parametric Adaptation for Neural Machine Translation},
  author={Ankur Bapna and Orhan Firat},
  booktitle={NAACL},
  year={2019}
}
Neural Networks trained with gradient descent are known to be susceptible to catastrophic forgetting caused by parameter shift during the training process. In the context of Neural Machine Translation (NMT) this results in poor performance on heterogeneous datasets and on sub-tasks like rare phrase translation. On the other hand, non-parametric approaches are immune to forgetting, perfectly complementing the generalization ability of NMT. However, attempts to combine non-parametric or retrieval… Expand
Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation
TLDR
This paper proposes a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for k-nearest-neighbor retrieval, and inserts lightweight adapters into the original NMT model to map the token-level representation of this task to the ideal representation of translation task. Expand
Learning Kernel-Smoothed Machine Translation with Retrieved Examples
  • Qingnan Jiang, Mingxuan Wang, Jun Cao, Shanbo Cheng, Shujian Huang, Lei Li
  • Computer Science
  • EMNLP
  • 2021
TLDR
This work proposes to learn Kernel-Smoothed Translation with Example Retrieval (KSTER), an effective approach to adapt neural machine translation models online and shows that even without expensive retraining, KSTER is able to achieve improvement of 1.1 to 1.5 BLEU scores over the best existing online adaptation methods. Expand
Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings
TLDR
This work proposes an approach that adapts models with domain-aware feature embeddings, which are learned via an auxiliary language modeling task, and allows the model to assign domain-specific representations to words and output sentences in the desired domain. Expand
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
TLDR
This work focuses on more robust approaches to domain adaptation for NMT, particularly the case where a system may need to translate sentences from multiple domains, and divides techniques into those relating to data selection, model architecture, parameter adaptation procedure, and inference procedure. Expand
Non-Parametric Online Learning from Human Feedback for Neural Machine Translation
TLDR
This approach introduces two k-nearest-neighbor modules: one module memorizes the human feedback, which is the correct sentences provided by human translators, while the other balances the usage of the history human feedback and original NMT models adaptively. Expand
Domain Differential Adaptation for Neural Machine Translation
TLDR
This paper proposes the framework of Domain Differential Adaptation (DDA), where instead of smoothing over differences the authors embrace them, directly modeling the difference between domains using models in a related task, and then using these learned domain differentials to adapt models for the target task accordingly. Expand
Simple, Scalable Adaptation for Neural Machine Translation
TLDR
The proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model, which adapt the model to multiple individual tasks simultaneously, paving the way towards universal machine translation. Expand
Instance-based Model Adaptation for Direct Speech Translation
TLDR
This work exploits an instance selection procedure to retrieve a small set of samples similar to the input query in terms of latent properties of its audio signal, which are used for an instance-specific fine-tuning of the model. Expand
Lexical Micro-adaptation for Neural Machine Translation
This work is inspired by a typical machine translation industry scenario in which translators make use of in-domain data for facilitating translation of similar or repeating sentences. We introduce aExpand
Learning to Reuse Translations: Guiding Neural Machine Translation with Examples
TLDR
Experiments show that the noise-masked encoder model allows NMT to learn useful information from examples with low fuzzy match scores (FMS) while the auxiliary decoder model is good for high-FMS examples. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 48 REFERENCES
Improving Neural Machine Translation Models with Monolingual Data
TLDR
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English. Expand
Multi-Domain Neural Machine Translation through Unsupervised Adaptation
TLDR
This work explores an efficient instance-based adaptation method that, by exploiting the similarity between the training instances and each test sentence, dynamically sets the hyperparameters of the learning algorithm and updates the generic model on-the-fly. Expand
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
TLDR
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models. Expand
Stanford Neural Machine Translation Systems for Spoken Language Domains
TLDR
This work further explores the effectiveness of NMT in spoken language domains by participating in the MT track of the IWSLT 2015 and demonstrates that using an existing NMT framework can achieve competitive results in the aforementioned scenarios when translating from English to German and Vietnamese. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
Guiding Neural Machine Translation with Retrieved Translation Pieces
TLDR
This paper proposes a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation. Expand
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
TLDR
This work introduces a new type of linear connections, named fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, and an interleaved bi-directional architecture for stacking the LSTM layers, and achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. Expand
Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario
TLDR
This paper compares the performance of a generic NMT system and phrase-based statistical machine translation (PBMT) system by training them on a generic parallel corpus composed of data from different domains and shows that PBMT outperforms its neural counterpart. Expand
...
1
2
3
4
5
...