#### Filter Results:

- Full text PDF available (106)

#### Publication Year

1998

2017

- This year (9)
- Last 5 years (70)
- Last 10 years (111)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Volodymyr Mnih, Koray Kavukcuoglu, +16 authors Demis Hassabis
- Nature
- 2015

The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task:… (More)

- Volodymyr Mnih, Koray Kavukcuoglu, +4 authors Martin A. Riedmiller
- ArXiv
- 2013

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven… (More)

- David Silver, Joel Veness
- NIPS
- 2010

This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent’s belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, MonteCarlo sampling is used to break the curse of dimensionality both during… (More)

- Volodymyr Mnih, Adrià Puigdomènech Badia, +5 authors Koray Kavukcuoglu
- ICML
- 2016

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all… (More)

- Sylvain Gelly, David Silver
- ICML
- 2007

The UCT algorithm learns a value function online using sample-based search. The <i>TD</i>(λ) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo… (More)

- Timothy P. Lillicrap, Jonathan J. Hunt, +5 authors Daan Wierstra
- ArXiv
- 2015

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20… (More)

- David Silver, Aja Huang, +17 authors Demis Hassabis
- Nature
- 2016

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural… (More)

- David Silver
- AIIDE
- 2005

Cooperative Pathfinding is a multi-agent path planning problem where agents must find non-colliding routes to separate destinations, given full information about the routes of other agents. This paper presents three new algorithms for efficiently solving this problem, suitable for use in Real-Time Strategy games and other real-time environments. The… (More)

- Hado van Hasselt, Arthur Guez, David Silver
- AAAI
- 2016

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the… (More)

- Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
- ArXiv
- 2015

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop… (More)