• Corpus ID: 8696662

Deep Recurrent Q-Learning for Partially Observable MDPs

@inproceedings{Hausknecht2015DeepRQ,
  title={Deep Recurrent Q-Learning for Partially Observable MDPs},
  author={Matthew J. Hausknecht and Peter Stone},
  booktitle={AAAI Fall Symposia},
  year={2015}
}
Deep Reinforcement Learning has yielded proficient controllers for complex tasks. [...] Key Result Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.Expand
On Improving Deep Reinforcement Learning for POMDPs
TLDR
This work proposes a new architecture called Action-specific Deep Recurrent Q-Network (ADRQN) to enhance learning performance in partially observable domains and demonstrates the effectiveness of the new architecture in several partially observable domains, including flickering Atari games.
Memory-based Deep Reinforcement Learning for POMDPs
  • Lingheng Meng, R. Gorbet, Dana Kuli'c
  • Computer Science
    2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2021
TLDR
Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) is proposed by introducing a memory component to TD3, and its performance is compared with other DRL algorithms in both MDPs and POMDPs.
Deep Q-Network with Predictive State Models in Partially Observable Domains
TLDR
A recurrent network is used to establish the recurrent PSR model, which can fully learn dynamics of the partially continuous observable environment and makes DQN no longer rely on a fixed number of history observations or recurrent neural network (RNN) to represent states in the case of partially observable environments.
Deep Reinforcement Learning with POMDPs
Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari
Reinforcement Learning via Recurrent Convolutional Neural Networks
TLDR
This work presents a more natural representation of the solutions to Reinforcement Learning (RL) problems, within 3 Recurrent Convolutional Neural Network (RCNN) architectures to better exploit this inherent structure.
Recurrent Reinforcement Learning: A Hybrid Approach
TLDR
This work investigates a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain, and proposes a new family of hybrid models that combines the strength of both supervised learning and reinforcement learning, trained in a joint fashion.
Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft
TLDR
This work uses Minecraft for its customization advantages and design two very simple missions that can be frames as Partially Observable Markov Decision Process and the Deep Q-Network, to see if the latter, which is trickier and longer to train, is always the best architecture when the agent has to deal with partial observability.
DQN: Does it scales?
Deep Reinforcement learning has recently emerged as a successful methodology for efficiently learning complex tasks and even succeeded in matching (and in some cases surpassing) human level
Neural Map: Structured Memory for Deep Reinforcement Learning
TLDR
This paper develops a memory system with an adaptable write operator that is customized to the sorts of 3D environments that DRL agents typically interact with and demonstrates empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments.
Deep Recurrent Q-Network with Truncated History
  • Hyunwoo Oh, Tomoyuki Kaneko
  • Computer Science
    2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI)
  • 2018
TLDR
Results show the necessity of using past information with a truncated length, rather than using only the current information or all of the past information in order to incorporate past information into the model.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Solving Deep Memory POMDPs with Recurrent Policy Gradients
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs)
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
TLDR
The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play, and proposed new agents based on this idea are proposed and shown to outperform DQN.
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
TLDR
The promise of ALE is illustrated by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning, and an evaluation methodology made possible by ALE is proposed.
Q-learning
TLDR
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Language Understanding for Text-based Games using Deep Reinforcement Learning
TLDR
This paper employs a deep reinforcement learning framework to jointly learn state representations and action policies using game rewards as feedback to map text descriptions into vector representations that capture the semantics of the game states.
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Reinforcement Learning with Long Short-Term Memory
TLDR
Model-free RL-LSTM using Advantage (λ) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events.
...
1
2
3
...