• Corpus ID: 160025533

Language Models are Unsupervised Multitask Learners

@inproceedings{Radford2019LanguageMA,
  title={Language Models are Unsupervised Multitask Learners},
  author={Alec Radford and Jeff Wu and Rewon Child and David Luan and Dario Amodei and Ilya Sutskever},
  year={2019}
}
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset matching… 

Figures and Tables from this paper

ANNA”:" Enhanced Language Representation for Question Answering

TLDR
This paper proposes an extended pre- training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.

Open-Ended Generative Commonsense Question Answering with Knowledge Graph-enhanced Language Models

TLDR
This work focuses on open-ended question answering, which is a language question and answer task that requires commonsense knowledge about the world, and focuses on a generative type of this task, where the answers must be spontaneously generated by a model, as opposed to a multi-choice question.

On the Multilingual Capabilities of Very Large-Scale English Language Models

TLDR
The results show that GPT-3 can be used, not only as a powerful generative pre-trained model for English, but for other languages as well, even for some with very few data in the training corpora, with room for improvement if optimization of the tokenization is addressed.

Multilingual Question Answering from Formatted Text applied to Conversational Agents

TLDR
Experiments are run that show that multilingual BERT, trained to solve the complex Question Answering task defined in the English SQuAD dataset, is able to achieve the same task in Japanese and French.

Unified Language Model Pre-training for Natural Language Understanding and Generation

TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.

Finetuned Language Models Are Zero-Shot Learners

TLDR
It is shown that instruction tuning —finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks and outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Few-Shot Text Generation with Natural Language Instructions

TLDR
GenPET, a method for text generation that is based on pattern-exploiting training, a recent approach for combining textual instructions with supervised learning that only works for classification tasks, is introduced.

Language Models are Few-shot Multilingual Learners

TLDR
It is shown that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are competitive compared to the existing state-of-the-art cross-lingual models and translation models.

What Makes Data-to-Text Generation Hard for Pretrained Language Models?

TLDR
An empirical study of both pre-trained and auto-regressive PLMs on the DART multi-domain D2T dataset, which probes the limits of PLMs by measuring performance on subsets of the evaluation data: novel predicates and abstractive test examples, and investigates two techniques to improve the performance on these subsets.

Multimodal Few-Shot Learning with Frozen Language Models

TLDR
The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings.
...

References

SHOWING 1-10 OF 75 REFERENCES

Dialog-based Language Learning

TLDR
This work studies dialog-based language learning, where supervision is given naturally and implicitly in the response of the dialog partner during the conversation, and shows that a novel model incorporating predictive lookahead is a promising approach for learning from a teacher's response.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

TLDR
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

TLDR
This work presents a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model and demonstrates that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods.

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

TLDR
It is shown how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks.

Learning and Evaluating General Linguistic Intelligence

TLDR
This work analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against general linguistic intelligence criteria, and proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.

Sequence to Sequence Learning with Neural Networks

TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

A Neural Conversational Model

TLDR
A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Unsupervised Pretraining for Sequence to Sequence Learning

TLDR
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models by pretraining the weights of the encoder and decoder with the pretrained weights of two language models and then fine-tuned with labeled data.

Unsupervised Machine Translation Using Monolingual Corpora Only

TLDR
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data.
...