Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning

@article{Sutton1999BetweenMA,
  title={Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning},
  author={Richard S. Sutton and Doina Precup and Satinder Singh},
  journal={Artif. Intell.},
  year={1999},
  volume={112},
  pages={181-211}
}
Learning, planning, and representing knowledge at multiple levels of temporal ab- straction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforce- ment learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options—closed-loop policies for taking ac- tion over a period of time. Examples of options include picking up an object, going to lunch… 
Linear options
TLDR
This work develops a knowledge construct, the linear option, which is capable of modeling temporally abstract dynamics in continuous state spaces and shows conditions under which a linear feature set is sufficient for accurately representing the value function of an option policy.
Unified Inter and Intra Options Learning Using Policy Gradient Methods
TLDR
This paper proposes a modular parameterization of intra-option policies together with option termination conditions and the option selection policy (inter options), and shows that these three decision components may be viewed as a unified policy over an augmented state-action space, to which standard policy gradient algorithms may be applied.
Learning Options in Reinforcement Learning
TLDR
This paper empirically explores a simple approach to creating options based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals, and proposes a greedy algorithm for identifying subgoal counts based on state visitation counts.
Towards Jumpy Planning
Model-free reinforcement learning (RL) is a powerful paradigm for learning complex tasks but suffers from high sample inefficiency as well as ignorance of the environment dynamics. On the other hand,
Temporal Abstraction in Reinforcement Learning with the Successor Representation
TLDR
This paper argues that the successor representation, which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions and takes a big picture view of recent results, showing how it can be used to discover options that facilitate either temporally-extended exploration or planning.
New Approaches to Temporal Abstraction in Hierarchical Reinforcement Learning
In classical reinforcement learning, planning is done at the level of atomic actions, which is highly laborious for complex tasks. By using temporal abstraction, an agent can construct plans more e
Automatic Construction of Temporally Extended Actions for MDPs Using Bisimulation Metrics
TLDR
This paper presents a mechanism for automatically constructing temporary extended actions, expressed as options, in a finite Markov Decision Process (MDP), and demonstrates empirically that this approach is able to improve the speed of reinforcement learning, and is generally not sensitive to parameter tuning.
From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving
TLDR
The paper demonstrates how the reward-sparsity can serve as a bridge between the high-level and low-level state- and action spaces and demonstrate that the integrated method is able to solve robotic tasks that involve non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.
Successor Options: An Option Discovery Framework for Reinforcement Learning
TLDR
This work adopts a complementary approach, where it attempts to discover options that navigate to landmark states, which are prototypical representatives of well-connected regions and can hence access the associated region with relative ease, and proposes Successor Options, which leverages Successor Representations to build a model of the state space.
Monte Carlo Hierarchical Model Learning
TLDR
T-UCT is introduced, a novel model-based RL approach for learning and exploiting the dynamics of structured hierarchical environments that learns hierarchical models with fewer samples than B-VISA and that this effect is magnified at deeper levels of hierarchical complexity.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 125 REFERENCES
Theoretical Results on Reinforcement Learning with Temporally Abstract Options
TLDR
New Bellman equations that are satisfied for sets of multi-time models are defined, which can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration.
Multi-time Models for Temporally Abstract Planning
TLDR
A more general form of temporally abstract model is introduced, the multi-time model, and its suitability for planning and learning by virtue of its relationship to the Bellman equations is established.
Improved Switching among Temporally Abstract Actions". In Advances in Neural Information Processing Systems
TLDR
This paper shows how an agent can plan with these high-level controllers and then use the results of such planning to find an even better plan, by modifying the existing controllers, with negligible additional cost and no re-planning.
Improved Switching among Temporally Abstract Actions
TLDR
This paper shows how an agent can plan with these high-level controllers and then use the results of such planning to find an even better plan, by modifying the existing controllers, with negligible additional cost and no re-planning.
Reinforcement Learning with a Hierarchy of Abstract Models
TLDR
Simulations on a set of compositionally-structured navigation tasks show that H-DYNA can learn to solve them faster than conventional RL algorithms, and the abstract models can be used to solve stochastic control tasks.
Intra-Option Learning about Temporally Abstract Actions
TLDR
This paper presents intra-option learning methods for learning value functions over options and for learning multi-time models of the consequences of options and sketches a convergence proof for intraoption value learning.
Learning to Plan in Continuous Domains
TLDR
It is argued that current models of world change implicitly adopt a discrete action assumption which precludes efficient reasoning about continuous world change, and an ideal continuous domain planner is defined.
Robust Reinforcement Learning in Motion Planning
TLDR
This paper presents a method that uses domain knowledge to reduce the number of failures during exploration and formulates the set of actions from which the RL agent composes a control policy to ensure that exploration is conducted in a policy space that excludes most of the unacceptable policies.
Finding Structure in Reinforcement Learning
TLDR
SKILLS discovers skills, which are partially defined action policies that arise in the context of multiple, related tasks, that are learned by minimizing the compactness of action policies, using a description length argument on their representation.
Planning under Time Constraints in Stochastic Domains
TLDR
This work provides a method, based on the theory of Markov decision processes, for efficient planning in stochastic domains, and describes the meta-level control problem of {\em deliberation scheduling}, allocating computational resources to these routines.
...
1
2
3
4
5
...