• Corpus ID: 6208256

Deep Reinforcement Learning with Double Q-Learning

@article{Hasselt2016DeepRL,
  title={Deep Reinforcement Learning with Double Q-Learning},
  author={H. V. Hasselt and Arthur Guez and David Silver},
  journal={ArXiv},
  year={2016},
  volume={abs/1509.06461}
}
The popular Q-learning algorithm is known to overestimate action values under certain conditions. [] Key Method We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several…

Figures and Tables from this paper

Exploring Deep Reinforcement Learning with Multi Q-Learning
TLDR
This paper presents a new algorithm called Multi Q- learning to attempt to overcome the instability seen in Q-learning, and test the algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks.
Deep Reinforcement Learning with Weighted Q-Learning
TLDR
This work provides the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes.
Mixing Update Q-value for Deep Reinforcement Learning
  • Zhunan Li, Xinwen Hou
  • Computer Science
    2019 International Joint Conference on Neural Networks (IJCNN)
  • 2019
TLDR
This paper proposes a novel mechanism to minimize its effects on both the critic and the actor, which builds on Double Q-learning, by mixing update action value based on the minimum and maximum between a pair of critics to limit the overestimation.
The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms
TLDR
DQV and DQV-Max present several important benefits: they converge significantly faster, can achieve super-human performance on DRL testbeds on which DQN and DDQN failed to do so, and suffer less from the overestimation bias of the Q function.
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
TLDR
When the discount factor progressively increases up to its final value, it is empirically shown that it is possible to significantly reduce the number of learning steps and the possibility to fall within a local optimum during the learning process, thus connecting the discussion with the exploration/exploitation dilemma.
Cross Learning in Deep Q-Networks
In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep
Historical Best Q-Networks for Deep Reinforcement Learning
TLDR
This paper presents multiple target networks which are the extension to the Deep Q-Networks (DQN), and chooses several networks that perform best in all previous networks as the authors' auxiliary networks based on the previously learned Q-value estimate networks.
Deep Reinforcement Learning with Averaged Target DQN
TLDR
The AVERAGED TARGET D QN (ADQN) algorithm is presented, an adaptation to the DQN class of algorithms which uses a weighted average over past learned networks to reduce generalization noise variance.
DEEP REINFORCEMENT LEARNING WITH ADAPTIVE COMBINED CRITICS
  • Computer Science
  • 2020
TLDR
This paper proposes a novel algorithm that can minimize the overestimation, avoid the underestimation bias and retain the policy improvement during the whole training process, and evaluates the method on a set of classical control tasks.
Reinforcement Learning in Pacman
TLDR
Deep Q-learning (DQL) can implicitly ‘extract’ important features, and interpolate Q-values of enormous state-action pairs without consulting large data tables.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Dueling Network Architectures for Deep Reinforcement Learning
TLDR
This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Issues in Using Function Approximation for Reinforcement Learning
TLDR
This paper gives a theoretical account of the phenomenon, deriving conditions under which one may expected it to cause learning to fail, and presents experimental results which support the theoretical findings.
Double Q-learning
TLDR
An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.
Playing Atari with Deep Reinforcement Learning
TLDR
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Deep Attention Recurrent Q-Network
TLDR
Tests of the proposed Deep Attention Recurrent Q-Network (DARQN) algorithm on multiple Atari 2600 games show level of performance superior to that of DQN.
Massively Parallel Methods for Deep Reinforcement Learning
TLDR
This work presents the first massively distributed architecture for deep reinforcement learning, using a distributed neural network to represent the value function or behaviour policy, and a distributed store of experience to implement the Deep Q-Network algorithm.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
TLDR
It is shown that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training.
Gradient temporal-difference learning algorithms
We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with
Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning
TLDR
A reinforcement learning approach called Double Q-learning is used to control a vehicle’s speed based on the environment constructed by naturalistic driving data and a new method called integrated perception approach to construct the environment is proposed.
...
1
2
3
4
...