Reinforcement Learning with LSTM in Non-MarkovianTasks with Long-Term
@inproceedings{Bakker2001ReinforcementLW, title={Reinforcement Learning with LSTM in Non-MarkovianTasks with Long-Term}, author={Bram Bakker}, year={2001} }
This paper presents reinforcement learning with a Long Short-Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage() learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a diicult variation of the pole balancing task.
No Paper Link Available
References
SHOWING 1-10 OF 32 REFERENCES
Learning long-term dependencies with gradient descent is difficult
- Computer ScienceIEEE Trans. Neural Networks
- 1994
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
- Computer ScienceNIPS
- 1998
It is shown that the model learned by BLHT converges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.
Learning to Forget: Continual Prediction with LSTM
- Computer ScienceNeural Computation
- 2000
This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.
Reinforcement Learning in Markovian and Non-Markovian Environments
- Computer ScienceNIPS
- 1990
This work addresses three problems with reinforcement learning and adaptive neuro-control: non-Markovian interfaces between learner and environment, problems with parallel learning and how interacting model/controller systems can be combined with vector-valued 'adaptive critics'.
Trading off perception with internal state: reinforcement learning and analysis of Q-Elman networks in a Markovian task
- Computer ScienceProceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium
- 2000
This paper shows that using internal state, called "supportive state", may alleviate this problem presenting an argument against the tendency to almost automatically use a direct mapping when the task is Markovian.
Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks
- Computer Science
- 1996
U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state, learns quickly, creates only task-relevant state distinctions, and handles noise well.
Long Short-Term Memory
- Computer ScienceNeural Computation
- 1997
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
HQ-Learning
- Computer ScienceAdapt. Behav.
- 1997
HQ-learning is a hierarchical extension of Q(λ)-learning designed to solve certain types of partially observable Markov decision problems to solve partially observable mazes with more states than those used in most previous POMDP work.
Overcoming Incomplete Perception with Utile Distinction Memory
- Computer ScienceICML
- 1993
Learning Policies with External Memory
- Computer ScienceICML
- 1999
This paper explores a {\it stigmergic} approach, in which the agent's actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent.