Learning to Act Using Real-Time Dynamic Programming

Abstract

for helping to clarify the relationships between heuristic search and control. We thank Rich Sutton, Chris Watkins, Paul Werbos, and Ron Williams for sharing their fundamental insights into this subject through numerous discussions , and we further thank Rich Sutton for rst making us aware of Korf's research and for his very thoughtful comments on the manuscript. We are very grateful to Dimitri Bertsekas and Steven Sullivan for independently pointing out an error in an earlier version of this article. Finally, we thank Harry Klopf, whose insight and persistence encouraged our interest in this class of learning problems. Abstract Learning methods based on dynamic programming (DP) are receiving increasing attention in artiicial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We introduce an algorithm based on DP, which we call Real-Time DP (RTDP), by which an embedded system can improve its performance with experience. RTDP generalizes Korf's Learning-Real-Time-A* algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several diierent classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins' Q-Learning algorithm. A secondary aim of this article is to provide a bridge between AI research on real-time planning and learning and relevant concepts and algorithms from control theory.

DOI: 10.1016/0004-3702(94)00011-O

Extracted Key Phrases

Showing 1-10 of 89 references

A Special Issue of Machine Learning on Reinforcement Learning

  • R S Sutton
  • 1992

Approximate dynamic programming for real-time control and neural modeling

  • J Werbos
  • 1992

Machine Learning

  • C J. C H Watkins, P Dayan
  • 1992

Numerical Methods for Stochastic Control Problems in Continuous Time

  • J Kushner, P Dupuis
  • 1992
Showing 1-10 of 688 extracted citations
050100'95'98'01'04'07'10'13'16
Citations per Year

1,107 Citations

Semantic Scholar estimates that this publication has received between 979 and 1,255 citations based on the available data.

See our FAQ for additional information.