• Publications
  • Influence
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Language Models are Few-Shot Learners
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. Expand
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets. Expand
Deep Reinforcement Learning from Human Preferences
This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion. Expand
Concrete Problems in AI Safety
A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented. Expand
Scaling Laws for Neural Language Models
Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence. Expand
Benchmarking Safe Exploration in Deep Reinforcement Learning
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies by trial and error. In many environments, safety is a critical concern and certain errors areExpand
Variational Option Discovery Algorithms
A tight connection between variational option discovery methods and variational autoencoders is highlighted, and Variational Autoencoding Learning of Options by Reinforcement (VALOR), a new method derived from the connection is introduced, and a curriculum learning approach is proposed. Expand
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
The following organisations are named on the report: Future of Humanity Institute, University of Oxford, Centre for the Study of Existential Risk, University of Cambridge, Center for a New AmericanExpand
Searching for Collective Behavior in a Large Network of Sensory Neurons
The properties of the neural vocabulary are explored by estimating its entropy, which constrains the population's capacity to represent visual information, and classifying activity patterns into a small set of metastable collective modes, showing that the neural codeword ensembles are extremely inhomogenous. Expand