Solving Deep Memory POMDPs with Recurrent Policy Gradients

@inproceedings{Wierstra2007SolvingDM,
  title={Solving Deep Memory POMDPs with Recurrent Policy Gradients},
  author={Daan Wierstra and Alexander F{\"o}rster and Jan Peters and J{\"u}rgen Schmidhuber},
  booktitle={ICANN},
  year={2007}
}
  • Daan Wierstra, Alexander Förster, +1 author Jürgen Schmidhuber
  • Published in ICANN 2007
  • Computer Science
  • This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a "Long Short-Term Memory" architecture, we are able to outperform other… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 82 CITATIONS

    Observer effect from stateful resources in agent sensing

    VIEW 29 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    A study in direct policy search

    VIEW 9 EXCERPTS
    CITES METHODS & BACKGROUND

    Reinforcement Learning in Supervised Problem Domains

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Guided Reinforcement Learning Under Partial Observability

    VIEW 3 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Deep Variational Reinforcement Learning for POMDPs

    VIEW 11 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Learning Long-term Dependencies with Deep Memory States

    VIEW 4 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2008
    2020

    CITATION STATISTICS

    • 5 Highly Influenced Citations

    • Averaged 14 Citations per year from 2017 through 2019

    • 75% Increase in citations per year in 2019 over 2018

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 21 REFERENCES

    Reinforcement Learning with Long Short-Term Memory

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Backpropagation through time: what does it do and how to do it

    VIEW 3 EXCERPTS
    HIGHLY INFLUENTIAL

    Toward effective combination of off-line and on-line training in ADP framework

    • D. Prokhorov
    • Computer Science
    • 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
    • 2007

    Policy Gradient Methods for Robotics

    • Jan Peters, Stefan Schaal
    • Computer Science
    • 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems
    • 2006
    VIEW 1 EXCERPT

    RNN overview (2004) http://www.idsia.ch/ ̃juergen/rnn.html

    • J. Schmidhuber
    • 2004
    VIEW 1 EXCERPT

    Natural Actor-Critic

    VIEW 1 EXCERPT