Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options |closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning. Formally, a set of options deened over an MDP constitutes an embedded semi-Markov decision process (SMDP), and the theory of SMDPs provides the foundation for the theory of options. However, the most interesting issues concern the interplay between the underlying MDP and the SMDP and are thus beyond SMDP theory. We present results for three such cases: 1) we show that the results of planning with options can be used during execution to interrupt options and thereby perform even better than planned, 2) we introduce new intra-option methods that are able to learn about an option from fragments of its execution, and 3) we propose a notion of subgoal which can be used to improve the options themselves. All of these results have precursors in the existing literature; the contribution of this paper is to establish them in a simpler and more general setting with fewer changes to the existing reinforcement learning framework. In particular, we show that all the results mentioned above can be obtained without committing to (or ruling out) any particular approach to state abstraction, hierarchy, function approximation, or the utility problem.