• Corpus ID: 216144609

Evolution of Q Values for Deep Q Learning in Stable Baselines

@article{Andrews2020EvolutionOQ,
  title={Evolution of Q Values for Deep Q Learning in Stable Baselines},
  author={Matthew Andrews and Cemil Dibek and Karina Palyutina},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.11766}
}
We investigate the evolution of the Q values for the implementation of Deep Q Learning (DQL) in the Stable Baselines library. Stable Baselines incorporates the latest Reinforcement Learning techniques and achieves superhuman performance in many game environments. However, for some simple non-game environments, the DQL in Stable Baselines can struggle to find the correct actions. In this paper we aim to understand the types of environment where this suboptimal behavior can happen, and also… 
1 Citations

Figures and Tables from this paper

Graph Neural Networks for Image Classification and Reinforcement Learning using Graph representations

Evaluating the performance of graph neural networks in two distinct domains: computer vision and reinforcement learning to learn whether a novel non-redundant representation for images as graphs can improve performance over trivial pixel to node mapping on a graph-level prediction graph.

References

SHOWING 1-10 OF 23 REFERENCES

Towards Characterizing Divergence in Deep Q-Learning

An algorithm is developed which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target networks, adaptive gradient optimizers, or using multiple Q functions).

A Theoretical Analysis of Deep Q-Learning

This work makes the first attempt to theoretically understand the deep Q-network (DQN) algorithm from both algorithmic and statistical perspectives and proposes the Minimax-D QN algorithm for zero-sum Markov game with two players.

Diagnosing Bottlenecks in Deep Q-learning Algorithms

It is found that large neural network architectures have many benefits with regards to learning stability; offer several practical compensations for overfitting; and develop a novel sampling method based on explicitly compensating for function approximation error that yields fair improvement on high-dimensional continuous control domains.

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

A reward shaping that accelerates learning and guards oscillating policies against repeated catastrophes is learned and intrinsic fear is introduced, a new method that mitigates these problems by avoiding dangerous states.

Dueling Network Architectures for Deep Reinforcement Learning

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Deep Reinforcement Learning and the Deadly Triad

This work investigates the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the Deadly triad, and in the agent's performance.

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Averaged-DQN is a simple extension to the DQN algorithm, based on averaging previously learned Q-values estimates, which leads to a more stable training procedure and improved performance by reducing approximation error variance in the target values.

Addressing Function Approximation Error in Actor-Critic Methods

This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

Playing Atari with Deep Reinforcement Learning

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Non-delusional Q-learning and value-iteration

A new notion of policy consistency is introduced and a local backup process is defined that ensures global consistency through the use of information sets---sets that record constraints on policies consistent with backed-up Q-values, yielding the first known algorithms that guarantee optimal results under general conditions.