• Corpus ID: 49313245

Improving Language Understanding by Generative Pre-Training

@inproceedings{Radford2018ImprovingLU,
  title={Improving Language Understanding by Generative Pre-Training},
  author={Alec Radford and Karthik Narasimhan},
  year={2018}
}
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus… 

Figures and Tables from this paper

Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
ANNA”:" Enhanced Language Representation for Question Answering
TLDR
This paper proposes an extended pre- training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Few-Shot Text Generation with Natural Language Instructions
TLDR
GenPET, a method for text generation that is based on pattern-exploiting training, a recent approach for combining textual instructions with supervised learning that only works for classification tasks, is introduced.
Self-training Improves Pre-training for Natural Language Understanding
TLDR
SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web, is introduced.
Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification
TLDR
This paper compares the performance of a light-weight linear classifier based on word embeddings versus a pre-trained language model, i.e., BERT, across a wide range of datasets and classification tasks, and shows the importance of domain-specific unlabeled data.
KgPLM: Knowledge-guided Language Model Pre-training via Generative and Discriminative Learning
TLDR
This work presents a language model pre-training framework guided by factual knowledge completion and verification, and uses the generative and discriminative approaches cooperatively to learn the model.
On the Multilingual Capabilities of Very Large-Scale English Language Models
TLDR
This work investigates the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan, which makes the results especially meaningful, and finds that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario.
Transfer Learning in Natural Language Processing
TLDR
An overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks are presented.
Generalizing Question Answering System with Pre-trained Language Model Fine-tuning
TLDR
A multi-task learning framework that learns the shared representation across different tasks, built on top of a large pre-trained language model, and then fine-tuned on multiple RC datasets is proposed.
Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling
TLDR
Domain-adaptive fine-tuning offers a simple and effective approach for the unsupervised adaptation of sequence labeling to difficult new domains and is tested on sequence labeling in two challenging domains: Early Modern English and Twitter.
...
...

References

SHOWING 1-10 OF 76 REFERENCES
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.
Semi-supervised sequence tagging with bidirectional language models
TLDR
A general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.
Unsupervised Pretraining for Sequence to Sequence Learning
TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.
Reasoning about Entailment with Neural Attention
TLDR
This paper proposes a neural model that reads two sentences to determine entailment using long short-term memory units and extends this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases, and presents a qualitative analysis of attention weights produced by this model.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference
TLDR
A new compare-propagate architecture is introduced where alignments pairs are compared and then propagated to upper layers for enhanced representation learning, and novel factorization layers are adopted for efficient compression of alignment vectors into scalar valued features, which are then used to augment the base word representations.
Unsupervised Machine Translation Using Monolingual Corpora Only
TLDR
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data.
...
...