• Publications
  • Influence
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Language Models are Few-Shot Learners
TLDR
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. Expand
Scaling Laws for Neural Language Models
TLDR
Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence. Expand
Generative Pretraining From Pixels
TLDR
This work trains a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure, and finds that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. Expand
Release Strategies and the Social Impacts of Language Models
TLDR
This report discusses OpenAI's work related to the release of its GPT-2 language model and discusses staged release, which allows time between model releases to conduct risk and benefit analyses as model sizes increased. Expand
Fine-Tuning Language Models from Human Preferences
TLDR
This paper builds on advances in generative pretraining of language models to apply reward learning to four natural language tasks: continuing text with positive sentiment or physically descriptive language, and summarization tasks on the TL;DR and CNN/Daily Mail datasets. Expand
Learning to summarize from human feedback
TLDR
This work shows that it is possible to significantly improve summary quality by training a model to optimize for human preferences, and establishes that the reward model generalizes to new datasets, and that optimizing the authors' reward model results in better summaries than optimizing ROUGE according to humans. Expand