Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition

  title={Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition},
  author={Xuan Liu and Jie Fu},
  journal={2019 American Control Conference (ACC)},
  • Xuan Liu, Jie Fu
  • Published 2019
  • Computer Science, Mathematics
  • 2019 American Control Conference (ACC)
In hierarchical planning for Markov decision processes (MDPs), temporal abstraction allows planning with macro-actions that take place at different time scale in form of sequential composition. In this paper, we propose a novel approach to compositional reasoning and hierarchical planning for MDPs under co-safe temporal logic constraints. In addition to sequential composition, we introduce a composition of policies based on generalized logic composition: Given sub-policies for sub-tasks and a… Expand
Planning with State Abstractions for Non-Markovian Task Specifications
The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. Expand


MDP optimal control under temporal logic constraints
A sufficient condition for a policy to be optimal is proposed, and a dynamic programming algorithm is developed that synthesizes a policy that is optimal under some conditions, and sub-optimal otherwise. Expand
Compositional Planning Using Optimal Option Models
This paper presents a unified view of intra- and inter-option model learning, based on a major generalisation of the Bellman equation, that enables compositional planning over many levels of abstraction. Expand
Temporal Logic Motion Planning and Control With Probabilistic Satisfaction Guarantees
We describe a computational framework for automatic deployment of a robot with sensor and actuator noise from a temporal logic specification over a set of properties that are satisfied by the regionsExpand
LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees
The problem of generating a control policy for a Markov Decision Process (MDP) such that the probability of satisfying an LTL formula over its states is maximized can be reduced to the problem of creating a robot control strategy that maximizes the probability to accomplish a task. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
Linear options
This work develops a knowledge construct, the linear option, which is capable of modeling temporally abstract dynamics in continuous state spaces and shows conditions under which a linear feature set is sufficient for accurately representing the value function of an option policy. Expand
Temporal abstraction in reinforcement learning
A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed. Expand
Optimal control in Markov decision processes via distributed optimization
  • Jie Fu, Shuo Han, U. Topcu
  • Computer Science, Mathematics
  • 2015 54th IEEE Conference on Decision and Control (CDC)
  • 2015
This work proposes a decomposition-based distributed synthesis algorithm which automatically exploits, if it exists, the modular structure in a given large-scale system and illustrates the proposed methods through robotic motion planning examples. Expand
The Option-Critic Architecture
This work derives policy gradient theorems for options and proposes a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Expand
Symbolic algorithms for qualitative analysis of Markov decision processes with Büchi objectives
This work presents the first subquadratic symbolic algorithm to compute the almost-sure winning set for MDPs with Büchi objectives, and improves the algorithm for symbolic scc computation; the previous known algorithm takes linear symbolic steps, and the new algorithm improves the constants associated with the linear number of steps. Expand