Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming


This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna archi-tectures. The Dyna-PI architecture is based on dynamic programming's policy iteration method and can be related to existing AI ideas such as evaluation functions and universal plans (reactive systems). Using a navigation task, results are shown for a simple Dyna-PI system that simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. The Dyna-Q architecture is based on Watkins's Q-learning, a new kind of reinforcement learning. Dyna-Q uses a less familiar set of data structures than does Dyna-PI, but is arguably simpler to implement and use. We show that Dyna-Q architectures are easy to adapt for use in changing environments.

Extracted Key Phrases

Showing 1-10 of 21 references

Sequential decision problems and neural networks

  • A G Barto, R S Sutton, C J. C H Watkins
  • 1990
2 Excerpts

Learning and sequential decision making

  • A G Barto, R S Sutton, C J. C H Watkins
  • 1989

Scaling reinforcement learning systems

  • S D Whitehead
  • 1989
2 Excerpts

A new approach to unsupervised learning in deterministic environments

  • R L Rivest, R E Schapire
  • 1987
1 Excerpt
Showing 1-10 of 593 extracted citations
Citations per Year

1,215 Citations

Semantic Scholar estimates that this publication has received between 1,041 and 1,417 citations based on the available data.

See our FAQ for additional information.