Planning by Incremental Dynamic Programming

@inproceedings{Sutton1991PlanningBI,
  title={Planning by Incremental Dynamic Programming},
  author={Richard S. Sutton},
  booktitle={ML},
  year={1991}
}
  • R. Sutton
  • Published in ML 1991
  • Computer Science
Abstract This paper presents the basic results and ideas of dynamic programming as they relate most directly to the concerns of planning in AI. [...] Key Method These incremental planning methods are based on continually updating an evaluation function and the situation-action mapping of a reactive system. Actions are generated by the reactive system and thus involve minimal delay, while the incremental planning process guarantees that the actions and evaluation function will eventually be optimal—no matter how…Expand
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
TLDR
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning. Expand
Control Strategies for a tochastic Planner
TLDR
The algorithms are shown to expand the agent’s knowledge where the world warrants it, with appropriate responsiveness to time pressure and randomness, and develop an introspective algorithm, using an internal representation of what computational work has already been done. Expand
Planning in Artificial Intelligence
TLDR
This chapter presents the classical propositional STRIPS planning language and describes the Markov Decision Processes framework (MDP), initially proposed in the Operations Research community before the AI community adopted it as a framework for planning under uncertainty. Expand
Control Strategies for a Stochastic Planner
TLDR
New algorithms for local planning over Markov decision processes are presented, showing to expand the agent's knowledge where the world warrants it, with appropriate responsiveness to time pressure and randomness and an introspective algorithm, using an internal representation of what computational work has already been done. Expand
Reinforcement Learning and Automated Planning: A Survey
TLDR
A detailed survey on Artificial Intelligent approaches, that combine Reinforcement Learning and Automated Planning, that are organized and presented according to various characteristics, as the used planning mechanism or the reinforcement learning algorithm. Expand
Reinforcement Learning with a Hierarchy of Abstract Models
TLDR
Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms, and the abstract models can be used to solve stochastic control tasks. Expand
Efficient learning and planning within the Dyna framework
TLDR
It is proposed that the backups to be performed in Dyna be prioritized in order to improve its efficiency and it is demonstrated with simple tasks that use some specific prioritizing schemes can lead to significant reductions in computational effort and corresponding improvements in learning performance. Expand
WP-DYNA: PLANNING AND REINFORCEMENT LEARNING IN WELL-PLANNABLE ENVIRONMENTS
Reinforcement learning (RL) involves sequential decision making in uncertain environments. The aim of the decision-making agent is to maximize the benefit of acting in its environment over anExpand
Incremental Learning of Trust while Reacting and Planning
The general idea of the proposed approach is to integrate simple reactive intelligence acquired by experimentation together with planning and learning processes. The autonomous agent [1] can beExpand
Route Planning and Learning from Execution
TLDR
This paper describes the method for route planning and dynamic update of the information available in a map and shows how a real traversal of the route is a learning opportunity to refine the domain information and produce better routes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
TLDR
This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods, and presents and shows results for two Dyna architectures, based on Watkins's Q-learning, a new kind of reinforcement learning. Expand
An Analysis of Time-Dependent Planning
This paper presents a framework for exploring issues in time-dependent planning: planning in which the time available to respond to predicted events varies, and the decision making required toExpand
Universal Plans for Reactive Robots in Unpredictable Environments
TLDR
This paper describes a new kind of plan, called a "universal plan", which can be synthesized automatically, yet generates appropriate behavior in unpredictable environments, and explicitly identifies predicates requiring monitoring at each moment of execution. Expand
Real-Time Heuristic Search
  • R. Korf
  • Mathematics, Computer Science
  • Artif. Intell.
  • 1990
TLDR
A variation of minimax lookahead search, and an analog to alpha-beta pruning that significantly improves the efficiency of the algorithm, and a new algorithm, called Real-Time-A∗, for interleaving planning and execution, which proves that the algorithm makes locally optimal decisions and is guaranteed to find a solution. Expand
Efficient memory-based learning for robot control
TLDR
A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered, thus permitting very quick predictions of the e ects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action. Expand
Becoming Increasingly Reactive
TLDR
A robot control architecture which combines a stimulus-response subsystem for rapid reaction, with a search-based planner for handling unanticipated situations, and results are presented demonstrating its ability to reduce routine reaction time for a simple mobile robot from minutes to under a second. Expand
Self-improving reactive agents: case studies of reinforcement learning frameworks
TLDR
This paper describes the learning agents and their performance, and summarizes the learning algorithms and the lessons I learned from this study. Expand
Lookahead Planning and Latent Learning in a Classifier System
TLDR
A classifier system that is able to learn and use internal models both to greatly decrease the time to learn general sequential decision tasks and to enable the system to exhibit latent learning is described. Expand
Execution Architectures and Compilation
TLDR
It is proposed that systems with the ability to learn, use and transform between all the types of knowledge may be able to achieve simultaneously higher levels of competence, efficiency and flexibility. Expand
Learning and Sequential Decision Making
|In this report we show how the class of adaptive prediction methods that Sutton called \temporal di erence," or TD, methods are related to the theory of squential decision making. TD methods haveExpand
...
1
2
3
4
...