# Deep Recurrent Q-Learning for Partially Observable MDPs

@inproceedings{Hausknecht2015DeepRQ, title={Deep Recurrent Q-Learning for Partially Observable MDPs}, author={Matthew J. Hausknecht and Peter Stone}, booktitle={AAAI Fall Symposia}, year={2015} }

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. [...] Key Result Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes. Expand

## Figures, Tables, and Topics from this paper

## 912 Citations

On Improving Deep Reinforcement Learning for POMDPs

- Computer Science, MathematicsArXiv
- 2017

This work proposes a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains and demonstrates the effectiveness of the new architecture in several partially observable domains, including flickering Atari games.

Memory-based Deep Reinforcement Learning for POMDPs

- Computer Science2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2021

Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) is proposed by introducing a memory component to TD3, and its performance is compared with other DRL algorithms in both MDPs and POMDPs.

Deep Q-Network with Predictive State Models in Partially Observable Domains

- Computer Science
- 2020

A recurrent network is used to establish the recurrent PSR model, which can fully learn dynamics of the partially continuous observable environment and makes DQN no longer rely on a fixed number of history observations or recurrent neural network (RNN) to represent states in the case of partially observable environments.

Deep Reinforcement Learning with POMDPs

- 2015

Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari…

Reinforcement Learning via Recurrent Convolutional Neural Networks

- Computer Science, Mathematics2016 23rd International Conference on Pattern Recognition (ICPR)
- 2016

This work presents a more natural representation of the solutions to Reinforcement Learning (RL) problems, within 3 Recurrent Convolutional Neural Network (RCNN) architectures to better exploit this inherent structure.

Recurrent Reinforcement Learning: A Hybrid Approach

- Computer ScienceArXiv
- 2015

This work investigates a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain, and proposes a new family of hybrid models that combines the strength of both supervised learning and reinforcement learning, trained in a joint fashion.

Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft

- Computer ScienceArXiv
- 2019

This work uses Minecraft for its customization advantages and design two very simple missions that can be frames as Partially Observable Markov Decision Process and the Deep Q-Network, to see if the latter, which is trickier and longer to train, is always the best architecture when the agent has to deal with partial observability.

DQN: Does it scales?

- 2016

Deep Reinforcement learning has recently emerged as a successful methodology for efficiently learning complex tasks and even succeeded in matching (and in some cases surpassing) human level…

Neural Map: Structured Memory for Deep Reinforcement Learning

- Computer Science, MathematicsICLR
- 2018

This paper develops a memory system with an adaptable write operator that is customized to the sorts of 3D environments that DRL agents typically interact with and demonstrates empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments.

Deep Recurrent Q-Network with Truncated History

- Computer Science2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI)
- 2018

Results show the necessity of using past information with a truncated length, rather than using only the current information or all of the past information in order to incorporate past information into the model.

## References

SHOWING 1-10 OF 29 REFERENCES

Human-level control through deep reinforcement learning

- Computer Science, MedicineNature
- 2015

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Solving Deep Memory POMDPs with Recurrent Policy Gradients

- Computer ScienceICANN
- 2007

This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs)…

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

- Computer ScienceNIPS
- 2014

The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.

Visualizing and Understanding Recurrent Networks

- Computer ScienceArXiv
- 2015

This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.

Long Short-Term Memory

- Computer Science, MedicineNeural Computation
- 1997

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

- Computer ScienceIJCAI
- 2015

The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.

Q-learning

- Computer ScienceMachine Learning
- 2004

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

Language Understanding for Text-based Games using Deep Reinforcement Learning

- Computer ScienceEMNLP
- 2015

This paper employs a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback to map text descriptions into vector representations that capture the semantics of the game states.

Reinforcement Learning: An Introduction

- Computer ScienceIEEE Transactions on Neural Networks
- 2005

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Reinforcement Learning with Long Short-Term Memory

- Computer ScienceNIPS
- 2001

Model-free RL-LSTM using Advantage (λ) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events.