• Corpus ID: 237532329

On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

  title={On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation},
  author={Dan Iter and David Grangier},
Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified by relying on a small sample of target domain data. This work examines the benefit of data selection for language modeling and machine translation. Our experiments assess the complementarity of selection with fine tuning and result in practical… 

Figures and Tables from this paper

The Trade-offs of Domain Adaptation for Neural Language Models

This work presents how adaptation techniques based on data selection, such as importance sampling, intelligent data selection and influence functions, can be presented in a common framework which highlights their similarity and also their subtle differences.

On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey

A taxonomy of domain adaptation approaches from a machine learning system view is proposed, covering methods for input augmentation, model optimization and personalization, and shed light in how to apply traditional machine learning methods to newly evolved and future technologies.

Generalizing through Forgetting - Domain Generalization for Symptom Event Extraction in Clinical Notes

This paper proposes a domain generalization method that dynamically masks frequent symptoms words in the source domain and pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population.

In-Context Demonstration Selection with Cross Entropy Difference

  • Dan IterReid Pryzant Chenguang Zhu
  • Computer Science
  • 2023
This work utilizes parameter efficient finetuning to train small models on training data that are used for computing the cross-entropy difference between a test example and every candidate in-context demonstration, which is used to rank and select in- context demonstrations independently for each test input.

A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

These findings constitute the largest set of experiments to validate, quantify, and expose many undocumented intuitions about text pretraining, which are hoped to help support more informed data-centric decisions in LM development.

Dynamic Data Selection for Neural Machine Translation

This paper introduces ‘dynamic data selection’ for NMT, a method in which the selected subset of training data is varied between different training epochs, and shows that the best results are achieved when applying a technique called ‘gradual fine-tuning’.

Domain Adaptation via Pseudo In-Domain Data Selection

The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding.

Dynamic Data Selection and Weighting for Iterative Back-Translation

Insight is provided into this commonly used approach and it is generalized to a dynamic curriculum learning strategy, which is applied to iterative back-translation models, and weighting strategies based on both the current quality of the sentence and its improvement over the previous iteration are proposed.

Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

It is found that neural language models are indeed viable tools for data selection: while the improvements are varied, they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Gradient-guided Loss Masking for Neural Machine Translation

This paper explores strategies that dynamically optimize data usage during the training process using the model's gradients on a small set of clean data to mitigate the negative effect of low quality training data on the performance of neural machine translation models.

Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

This work uses reinforcement learning to learn a curriculum automatically, jointly with the NMT system, in the course of a single training run, and shows that this approach can beat uniform baselines on Paracrawl and WMT English-to-French datasets.

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

Methods for measuring and selecting data for domain MT and applies them to denoising NMT training show its significant effectiveness for NMT to train on data with severe noise.

Semi-Supervised Learning and Domain Adaptation in Natural Language Processing

  • Anders Søgaard
  • Computer Science
    Semi-Supervised Learning and Domain Adaptation in Natural Language Processing
  • 2013
This book introduces basic supervised learning algorithms applicable to natural language processing (NLP) and shows how the performance of these algorithms can often be improved by exploiting the