Reducing policy degradation in neuro-dynamic programming

@inproceedings{Gabel2006ReducingPD,
  title={Reducing policy degradation in neuro-dynamic programming},
  author={Thomas Gabel and Martin A. Riedmiller},
  booktitle={ESANN},
  year={2006}
}
We focus on neuro-dynamic programming methods to learn state-action value functions and outline some of the inherent problems to be faced, wh en performing reinforcement learning in combination with function approximation . In an attempt to overcome some of these problems, we develop a reinforceme nt learning method that monitors the learning process, enables the learner to reflect wh ther it is better to cease learning, and thus obtains more stable learning results. 

From This Paper

Figures, tables, and topics from this paper.

Citations

Publications citing this paper.
Showing 1-10 of 10 extracted citations

References

Publications referenced by this paper.
Showing 1-7 of 7 references

Scheduling with adaptive age nts - an empirical evaluation

  • W. Hunger, M. Riedmiller
  • Proceedings of EWRL-5, European Workshop on…
  • 2001
1 Excerpt

Q-Learning

  • C. Watkins, P. Dayan
  • Machine Learning, 8:279–292
  • 1992
1 Excerpt

Similar Papers

Loading similar papers…