• Corpus ID: 49313245

Improving Language Understanding by Generative Pre-Training

  title={Improving Language Understanding by Generative Pre-Training},
  author={Alec Radford and Karthik Narasimhan},
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus… 

Figures and Tables from this paper

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

ANNA”:" Enhanced Language Representation for Question Answering

This paper proposes an extended pre- training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.

Few-Shot Text Generation with Natural Language Instructions

GenPET, a method for text generation that is based on pattern-exploiting training, a recent approach for combining textual instructions with supervised learning that only works for classification tasks, is introduced.

Self-training Improves Pre-training for Natural Language Understanding

SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web, is introduced.

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

This work investigates two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a variety of tasks, including part- of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension.

Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification

This paper compares the performance of a light-weight linear classifier based on word embeddings versus a pre-trained language model, i.e., BERT, across a wide range of datasets and classification tasks, and shows the importance of domain-specific unlabeled data.

KgPLM: Knowledge-guided Language Model Pre-training via Generative and Discriminative Learning

This work presents a language model pre-training framework guided by factual knowledge completion and verification, and uses the generative and discriminative approaches cooperatively to learn the model.

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization

This paper proposes a simple yet effective pretraining method named LICHEE to efficiently incorporate multi-grained information of input text that can be applied to various pretrained language models and improve their representation capability.

On the Multilingual Capabilities of Very Large-Scale English Language Models

This work investigates the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan, which makes the results especially meaningful, and finds that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario.

Transfer Learning in Natural Language Processing

An overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks are presented.



Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.

Universal Language Model Fine-tuning for Text Classification

This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

Semi-supervised sequence tagging with bidirectional language models

A general semi-supervised approach for adding pretrained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.

A Simple but Tough-to-Beat Baseline for Sentence Embeddings

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.

Reasoning about Entailment with Neural Attention

This paper proposes a neural model that reads two sentences to determine entailment using long short-term memory units and extends this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases, and presents a qualitative analysis of attention weights produced by this model.

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the