Corpus ID: 202750317

Modular Deep Reinforcement Learning with Temporal Logic Specifications

  title={Modular Deep Reinforcement Learning with Temporal Logic Specifications},
  author={Li Yuan and Mohammadhosein Hasanbeig and Alessandro Abate and Daniel Kroening},
We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep… Expand
Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions
  • Mingyu Cai, C. Vasile
  • Computer Science
  • ArXiv
  • 2021
A learning-based control framework consisting of an innovative reward scheme for RL-agents with the formal guarantee that global optimal policies maximize the probability of satisfying the LTL specifications, and an ECBF-based modular deep RL algorithm that achieves near-perfect success rates and safety guarding with high probability confidence during training is proposed. Expand
Deep Reinforcement Learning with Temporal Logics
This work proposes a deep Reinforcement Learning method for policy synthesis in continuous-state/action unknown environments, under requirements expressed in Linear Temporal Logic (LTL), and shows that this combination lifts the applicability of deep RL to complex temporal and memory-dependent policy synthesis goals. Expand
Lifelong Reinforcement Learning with Temporal Logic Formulas and Reward Machines
Experimental results show that LSRM outperforms the methods that learn the target tasks from scratch by taking advantage of the task decomposition using SLTL and knowledge transfer over RM during the lifelong learning process. Expand
Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPS while providing probabilistic closeness guarantees. Expand
A Framework for Transforming Specifications in Reinforcement Learning
A formal framework for defining transformations among RL tasks with different forms of objectives is developed and the notion of sampling-based reduction is defined to relate two MDPs whose transition probabilities can be learnt by sampling, followed by formalization of preservation of optimal policies, convergence, and robustness. Expand
Inverse Reinforcement Learning of Autonomous Behaviors Encoded as Weighted Finite Automata
This paper uses a spectral learning approach to extract a weighted finite automaton, approximating the unknown logic structure of the task, and demonstrates that the method is capable of generalizing the execution of the inferred task specification to new environment configurations. Expand
The Logical Options Framework
This work introduces a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable that can be composed to satisfy unseen tasks with only 10-50 retraining steps on benchmarks. Expand
Towards Verifiable and Safe Model-Free Reinforcement Learning
This line of work addresses issues by proposing a general framework that leverages the success of RL in learning high-performance controllers, while guaranteeing the satisfaction of given requirements and guiding the learning process within safe configurations. Expand
Cautious Reinforcement Learning with Logical Constraints
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. PoliciesExpand
Safe Reinforcement Learning through Meta-learned Instincts
The results suggest that meta-learning augmented with an instinctual network is a promising new approach for safe AI, which may enable progress in this area on a variety of different domains. Expand


Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Teaching Multiple Tasks to an RL Agent using LTL
This paper uses Linear Temporal Logic as a language for specifying multiple tasks in a manner that supports the composition of learned skills and proposes a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees. Expand
Hierarchical Relative Entropy Policy Search
This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies. Expand
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations. Expand
Modular Multitask Reinforcement Learning with Policy Sketches
Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks. Expand
Improving Stability in Deep Reinforcement Learning with Weight Averaging
Deep reinforcement learning (RL) methods are notoriously unstable during training. In this paper, we focus on model-free RL algorithms where we observe that the average reward is unstable throughoutExpand
Strategic Attentive Writer for Learning Macro-Actions
A novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting, which is at the same time a general algorithm that can be applied on any sequence data. Expand
Logically-Constrained Neural Fitted Q-Iteration
We propose a method for efficient training of Q-functions for continuous-state Markov Decision Processes (MDPs) such that the traces of the resulting policies satisfy a given Linear Temporal LogicExpand
Temporal abstraction in reinforcement learning
A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed. Expand
Playing Atari with Deep Reinforcement Learning
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand