• Publications
  • Influence
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Playing Atari with Deep Reinforcement Learning
TLDR
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand
Asynchronous Methods for Deep Reinforcement Learning
TLDR
A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input. Expand
Deep Reinforcement Learning with Double Q-Learning
TLDR
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games. Expand
Mastering the game of Go with deep neural networks and tree search
TLDR
Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go. Expand
Deterministic Policy Gradient Algorithms
TLDR
This paper introduces an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy and demonstrates that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces. Expand
Prioritized Experience Replay
TLDR
A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games. Expand
Mastering the game of Go without human knowledge
TLDR
An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo. Expand
Monte-Carlo Planning in Large POMDPs
TLDR
POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs as 10 x 10 battleship and partially observable PacMan, with approximately 1018 and 1056 states respectively. Expand
...
1
2
3
4
5
...