• Corpus ID: 16457223

Reinforcement Learning with LSTM in Non-MarkovianTasks with Long-Term

@inproceedings{Bakker2001ReinforcementLW,
  title={Reinforcement Learning with LSTM in Non-MarkovianTasks with Long-Term},
  author={Bram Bakker},
  year={2001}
}
  • B. Bakker
  • Published 2001
  • Computer Science, Psychology
This paper presents reinforcement learning with a Long Short-Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage() learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevant events. This is demonstrated in a T-maze task, as well as in a diicult variation of the pole balancing task. 

References

SHOWING 1-10 OF 32 REFERENCES
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
TLDR
It is shown that the model learned by BLHT converges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.
Learning to Forget: Continual Prediction with LSTM
TLDR
This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.
Reinforcement Learning in Markovian and Non-Markovian Environments
TLDR
This work addresses three problems with reinforcement learning and adaptive neuro-control: non-Markovian interfaces between learner and environment, problems with parallel learning and how interacting model/controller systems can be combined with vector-valued 'adaptive critics'.
Trading off perception with internal state: reinforcement learning and analysis of Q-Elman networks in a Markovian task
  • B. Bakker, G. V. D. V. V. D. Kleij
  • Computer Science
    Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium
  • 2000
TLDR
This paper shows that using internal state, called "supportive state", may alleviate this problem presenting an argument against the tendency to almost automatically use a direct mapping when the task is Markovian.
Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks
TLDR
U-Tree, a reinforcement learning algorithm that uses selective attention and shortterm memory to simultaneously address the intertwined problems of large perceptual state spaces and hidden state, learns quickly, creates only task-relevant state distinctions, and handles noise well.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
HQ-Learning
TLDR
HQ-learning is a hierarchical extension of Q(λ)-learning designed to solve certain types of partially observable Markov decision problems to solve partially observable mazes with more states than those used in most previous POMDP work.
Overcoming Incomplete Perception with Utile Distinction Memory
Learning Policies with External Memory
TLDR
This paper explores a {\it stigmergic} approach, in which the agent's actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent.
...
...