# Deep Reinforcement Learning with Double Q-Learning

@article{Hasselt2016DeepRL, title={Deep Reinforcement Learning with Double Q-Learning}, author={H. V. Hasselt and Arthur Guez and David Silver}, journal={ArXiv}, year={2016}, volume={abs/1509.06461} }

The popular Q-learning algorithm is known to overestimate action values under certain conditions. [... ] Key Method We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several… Expand

## Figures and Tables from this paper

## 3,371 Citations

Exploring Deep Reinforcement Learning with Multi Q-Learning

- Computer Science
- 2016

This paper presents a new algorithm called Multi Q- learning to attempt to overcome the instability seen in Q-learning, and test the algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks.

Deep Reinforcement Learning with Weighted Q-Learning

- Computer ScienceArXiv
- 2020

This work provides the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes.

Mixing Update Q-value for Deep Reinforcement Learning

- Computer Science2019 International Joint Conference on Neural Networks (IJCNN)
- 2019

This paper proposes a novel mechanism to minimize its effects on both the critic and the actor, which builds on Double Q-learning, by mixing update action value based on the minimum and maximum between a pair of critics to limit the overestimation.

The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms

- Computer Science2020 International Joint Conference on Neural Networks (IJCNN)
- 2020

DQV and DQV-Max present several important benefits: they converge significantly faster, can achieve super-human performance on DRL testbeds on which DQN and DDQN failed to do so, and suffer less from the overestimation bias of the Q function.

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

- Computer ScienceArXiv
- 2015

When the discount factor progressively increases up to its final value, it is empirically shown that it is possible to significantly reduce the number of learning steps and the possibility to fall within a local optimum during the learning process, thus connecting the discussion with the exploration/exploitation dilemma.

Cross Learning in Deep Q-Networks

- Computer ScienceArXiv
- 2020

In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep…

Historical Best Q-Networks for Deep Reinforcement Learning

- Computer Science2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)
- 2018

This paper presents multiple target networks which are the extension to the Deep Q-Networks (DQN), and chooses several networks that perform best in all previous networks as the authors' auxiliary networks based on the previously learned Q-value estimate networks.

Deep Reinforcement Learning with Averaged Target DQN

- Computer ScienceArXiv
- 2016

The AVERAGED TARGET D QN (ADQN) algorithm is presented, an adaptation to the DQN class of algorithms which uses a weighted average over past learned networks to reduce generalization noise variance.

DEEP REINFORCEMENT LEARNING WITH ADAPTIVE COMBINED CRITICS

- Computer Science
- 2020

This paper proposes a novel algorithm that can minimize the overestimation, avoid the underestimation bias and retain the policy improvement during the whole training process, and evaluates the method on a set of classical control tasks.

Reinforcement Learning in Pacman

- Computer Science
- 2017

Deep Q-learning (DQL) can implicitly ‘extract’ important features, and interpolate Q-values of enormous state-action pairs without consulting large data tables.

## References

SHOWING 1-10 OF 33 REFERENCES

Dueling Network Architectures for Deep Reinforcement Learning

- Computer ScienceICML
- 2016

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Issues in Using Function Approximation for Reinforcement Learning

- Computer Science
- 1999

This paper gives a theoretical account of the phenomenon, deriving conditions under which one may expected it to cause learning to fail, and presents experimental results which support the theoretical findings.

Double Q-learning

- Computer ScienceNIPS
- 2010

An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.

Playing Atari with Deep Reinforcement Learning

- Computer ScienceArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Deep Attention Recurrent Q-Network

- Computer ScienceArXiv
- 2015

Tests of the proposed Deep Attention Recurrent Q-Network (DARQN) algorithm on multiple Atari 2600 games show level of performance superior to that of DQN.

Massively Parallel Methods for Deep Reinforcement Learning

- Computer ScienceArXiv
- 2015

This work presents the first massively distributed architecture for deep reinforcement learning, using a distributed neural network to represent the value function or behaviour policy, and a distributed store of experience to implement the Deep Q-Network algorithm.

Human-level control through deep reinforcement learning

- Computer ScienceNature
- 2015

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2016

It is shown that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training.

Gradient temporal-difference learning algorithms

- Computer Science
- 2011

We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with…

Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning

- Computer Science2018 IEEE Intelligent Vehicles Symposium (IV)
- 2018

A reinforcement learning approach called Double Q-learning is used to control a vehicle’s speed based on the environment constructed by naturalistic driving data and a new method called integrated perception approach to construct the environment is proposed.