Reducing policy degradation in neuro-dynamic programming

  title={Reducing policy degradation in neuro-dynamic programming},
  author={Thomas Gabel and Martin A. Riedmiller},
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, wh en performing reinforcement learning in combination with function approximation . In an attempt to overcome some of these problems, we develop a reinforceme nt learning method that monitors the learning process, enables the learner to reflect wh ther it is better to cease learning, and thus obtains more stable learning results. 

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 10 extracted citations


Publications referenced by this paper.
Showing 1-7 of 7 references

Scheduling with adaptive age nts - an empirical evaluation

  • W. Hunger, M. Riedmiller
  • Proceedings of EWRL-5, European Workshop on…
  • 2001
1 Excerpt


  • C. Watkins, P. Dayan
  • Machine Learning, 8:279–292
  • 1992
1 Excerpt

Similar Papers

Loading similar papers…