Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning


Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options—closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning. Formally, a set of options defined over an MDP constitutes a semi-Markov decision process (SMDP), and the theory of SMDPs provides the foundation for the theory of options. However, the most interesting issues concern the interplay between the underlying MDP and the SMDP and are thus beyond SMDP theory. We present results for three such cases: (1) we show that the results of planning with options can be used during execution to interrupt options and thereby perform even better than planned, (2) we introduce new intra-option methods that are able to learn about an option from fragments of its execution, and (3) we propose a notion of subgoal that can be used to improve the options themselves. All of these results have precursors in the existing literature; the contribution of this paper is to establish them in a simpler and more general setting with fewer changes to the existing reinforcement learning framework. In particular, we show that these results can be obtained without committing to (or ruling out) any particular approach to state abstraction, hierarchy, function approximation, or the macroutility problem. ! 1999 Published by Elsevier Science B.V. All rights reserved. ∗ Corresponding author. 0004-3702/99/$ – see front matter ! 1999 Published by Elsevier Science B.V. All rights reserved. PII: S0004-3702(99)00052 -1 182 R.S. Sutton et al. / Artificial Intelligence 112 (1999) 181–211

DOI: 10.1016/S0004-3702(99)00052-1

Extracted Key Phrases

Citations per Year

1,493 Citations

Semantic Scholar estimates that this publication has 1,493 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Sutton1999BetweenMA, title={Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning}, author={Richard S. Sutton and Doina Precup and Satinder P. Singh}, journal={Artif. Intell.}, year={1999}, volume={112}, pages={181-211} }