• Publications
  • Influence
Human-level control through deep reinforcement learning
TLDR
We use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning using the same algorithm, network architecture and hyperparameters. Expand
  • 11,563
  • 1944
  • PDF
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
We develop a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
  • 560
  • 111
  • PDF
Noisy Networks for Exploration
TLDR
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. Expand
  • 394
  • 57
  • PDF
Universal Intelligence: A Definition of Machine Intelligence
TLDR
A fundamental problem in artificial intelligence is that nobody really knows what intelligence is. Expand
  • 390
  • 42
  • PDF
Deep Reinforcement Learning from Human Preferences
TLDR
We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of the interactions with the environment. Expand
  • 364
  • 34
  • PDF
Massively Parallel Methods for Deep Reinforcement Learning
TLDR
We present the first massively distributed architecture for deep reinforcement learning. Expand
  • 292
  • 25
  • PDF
A Collection of Definitions of Intelligence
TLDR
This chapter is a survey of a large number of informal definitions of "intelligence" that the authors have collected over the years. Expand
  • 256
  • 16
  • PDF
DeepMind Lab
TLDR
DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. Expand
  • 140
  • 10
  • PDF
Fitness uniform optimization
  • Marcus Hutter, S. Legg
  • Mathematics, Computer Science
  • IEEE Transactions on Evolutionary Computation
  • 1 October 2006
TLDR
We propose a new selection scheme, which is uniform in the fitness values. Expand
  • 69
  • 7
  • PDF
Scalable agent alignment via reward modeling: a research direction
TLDR
We outline a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning. Expand
  • 60
  • 7
  • PDF
...
1
2
3
4
5
...