On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation
@article{Iter2021OnTC, title={On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation}, author={Dan Iter and David Grangier}, journal={ArXiv}, year={2021}, volume={abs/2109.07591} }
Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experiments assess the complementarity of selection with fine tuning and result in practical…Â
Figures and Tables from this paper
5 Citations
The Trade-offs of Domain Adaptation for Neural Language Models
- Computer ScienceACL
- 2022
This work presents how adaptation techniques based on data selection, such as importance sampling, intelligent data selection and influence functions, can be presented in a common framework which highlights their similarity and also their subtle differences.
On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey
- Computer ScienceArXiv
- 2022
A taxonomy of domain adaptation approaches from a machine learning system view is proposed, covering methods for input augmentation, model optimization and personalization, and shed light in how to apply traditional machine learning methods to newly evolved and future technologies.
Generalizing through Forgetting - Domain Generalization for Symptom Event Extraction in Clinical Notes
- Computer ScienceArXiv
- 2022
This paper proposes a domain generalization method that dynamically masks frequent symptoms words in the source domain and pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population.
In-Context Demonstration Selection with Cross Entropy Difference
- Computer Science
- 2023
This work utilizes parameter efficient finetuning to train small models on training data that are used for computing the cross-entropy difference between a test example and every candidate in-context demonstration, which is used to rank and select in- context demonstrations independently for each test input.
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
- Computer ScienceArXiv
- 2023
These findings constitute the largest set of experiments to validate, quantify, and expose many undocumented intuitions about text pretraining, which are hoped to help support more informed data-centric decisions in LM development.
44 References
Dynamic Data Selection for Neural Machine Translation
- Computer ScienceEMNLP
- 2017
This paper introduces ‘dynamic data selection’ for NMT, a method in which the selected subset of training data is varied between different training epochs, and shows that the best results are achieved when applying a technique called ‘gradual fine-tuning’.
Domain Adaptation via Pseudo In-Domain Data Selection
- Computer ScienceEMNLP
- 2011
The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding.
Dynamic Data Selection and Weighting for Iterative Back-Translation
- Computer ScienceEMNLP
- 2020
Insight is provided into this commonly used approach and it is generalized to a dynamic curriculum learning strategy, which is applied to iterative back-translation models, and weighting strategies based on both the current quality of the sentence and its improvement over the previous iteration are proposed.
Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
- Computer ScienceACL
- 2013
It is found that neural language models are indeed viable tools for data selection: while the improvements are varied, they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
- Computer ScienceACL
- 2020
It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Computer ScienceJ. Mach. Learn. Res.
- 2020
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Gradient-guided Loss Masking for Neural Machine Translation
- Computer ScienceArXiv
- 2021
This paper explores strategies that dynamically optimize data usage during the training process using the model's gradients on a small set of clean data to mitigate the negative effect of low quality training data on the performance of neural machine translation models.
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation
- Computer ScienceNAACL
- 2019
This work uses reinforcement learning to learn a curriculum automatically, jointly with the NMT system, in the course of a single training run, and shows that this approach can beat uniform baselines on Paracrawl and WMT English-to-French datasets.
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection
- Computer ScienceWMT
- 2018
Methods for measuring and selecting data for domain MT and applies them to denoising NMT training show its significant effectiveness for NMT to train on data with severe noise.
Semi-Supervised Learning and Domain Adaptation in Natural Language Processing
- Computer ScienceSemi-Supervised Learning and Domain Adaptation in Natural Language Processing
- 2013
This book introduces basic supervised learning algorithms applicable to natural language processing (NLP) and shows how the performance of these algorithms can often be improved by exploiting the…