Corpus ID: 237513469

Efficient Domain Adaptation of Language Models via Adaptive Tokenization

  title={Efficient Domain Adaptation of Language Models via Adaptive Tokenization},
  author={Vin Sachidananda and Jason S Kessler and Yi-an Lai},
Contextual embedding-based language models trained on large data sets, such as BERT and RoBERTa, provide strong performance across a wide range of tasks and are ubiquitous in modern NLP. It has been observed that fine-tuning these models on tasks involving data from domains different from that on which they were pretrained can lead to suboptimal performance. Recent work has explored approaches to adapt pretrained language models to new domains by incorporating additional pretraining on domain… Expand

Tables from this paper


Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Expand
Multi-Stage Pretraining for Low-Resource Domain Adaptation
Transfer learning techniques are particularly useful in NLP tasks where a sizable amount of high-quality annotated data is difficult to obtain. Current approaches directly adapt a pre-trainedExpand
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute. Expand
Probing Pretrained Language Models for Lexical Semantics
A systematic empirical analysis across six typologically diverse languages and five different lexical tasks indicates patterns and best practices that hold universally, but also point to prominent variations across languages and tasks. Expand
exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources
The exBERT training method is novel in learning the new vocabulary and the extension module while keeping the weights of the original BERT model fixed, resulting in a substantial reduction in required training resources. Expand
SciBERT: A Pretrained Language Model for Scientific Text
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Publicly Available Clinical BERT Embeddings
This work explores and releases two BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically, and demonstrates that using a domain-specific model yields performance improvements on 3/5 clinical NLP tasks, establishing a new state-of-the-art on the MedNLI dataset. Expand
Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification
This work approaches ATSC using a two-step procedure: self-supervised domain-specific BERT language model finetuned, followed by supervised task-specific finetuning, which enables it to produce new state-of-the-art performance on the SemEval 2014 Task 4 restaurants dataset. Expand
Efficient Intent Detection with Dual Sentence Encoders
The usefulness and wide applicability of the proposed intent detectors are demonstrated, showing that they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets. Expand