Corpus ID: 202750317

Modular Deep Reinforcement Learning with Temporal Logic Specifications

@article{Yuan2019ModularDR,
  title={Modular Deep Reinforcement Learning with Temporal Logic Specifications},
  author={Li Yuan and Mohammadhosein Hasanbeig and Alessandro Abate and Daniel Kroening},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.11591}
}
We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep… Expand
Deep Reinforcement Learning with Temporal Logics
TLDR
This work proposes a deep Reinforcement Learning method for policy synthesis in continuous-state/action unknown environments, under requirements expressed in Linear Temporal Logic (LTL), and shows that this combination lifts the applicability of deep RL to complex temporal and memory-dependent policy synthesis goals. Expand
Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
TLDR
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPS while providing probabilistic closeness guarantees. Expand
Inverse Reinforcement Learning of Autonomous Behaviors Encoded as Weighted Finite Automata
TLDR
This paper uses a spectral learning approach to extract a weighted finite automaton, approximating the unknown logic structure of the task, and demonstrates that the method is capable of generalizing the execution of the inferred task specification to new environment configurations. Expand
The Logical Options Framework
TLDR
This work introduces a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable that can be composed to satisfy unseen tasks with only 10-50 retraining steps on benchmarks. Expand
Towards Verifiable and Safe Model-Free Reinforcement Learning
TLDR
This line of work addresses issues by proposing a general framework that leverages the success of RL in learning high-performance controllers, while guaranteeing the satisfaction of given requirements and guiding the learning process within safe configurations. Expand
Cautious Reinforcement Learning with Logical Constraints
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. PoliciesExpand
Safe Reinforcement Learning through Meta-learned Instincts
TLDR
The results suggest that meta-learning augmented with an instinctual network is a promising new approach for safe AI, which may enable progress in this area on a variety of different domains. Expand
Modular deep reinforcement learning from reward and punishment for robot navigation
TLDR
This work discusses the signal scaling of reward and punishment related to discounting factor γ, and proposes a weak constraint for signaling design and a state-value dependent weighting scheme that automatically tunes the mixing weights: hard-max and softmax based on a case analysis of Boltzmann distribution. Expand
Jump Operator Planning: Goal-Conditioned Policy Ensembles and Zero-Shot Transfer
TLDR
This work proposes a novel hierarchical and compositional framework called Jump-Operator Dynamic Programming for quickly computing solutions within a super-exponential space of sequential sub-goal tasks with ordering constraints, while also providing a fast linearly-solvable algorithm as an implementation. Expand
Safer Reinforcement Learning through Transferable Instinct Networks
TLDR
This work demonstrates an approach where an additional policy can override the main policy and offer a safer alternative action and demonstrates IRL in the OpenAI Safety gym domain, in which it receives a significantly lower number of safety violations during training than a baseline RL approach while reaching similar task performance. Expand
...
1
2
...

References

SHOWING 1-10 OF 56 REFERENCES
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Teaching Multiple Tasks to an RL Agent using LTL
TLDR
This paper uses Linear Temporal Logic as a language for specifying multiple tasks in a manner that supports the composition of learned skills and proposes a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees. Expand
Hierarchical Relative Entropy Policy Search
TLDR
This work defines the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-Policies for execution by the agent and treats them as latent variables which allows for distribution of the update information between the sub- policies. Expand
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TLDR
h-DQN is presented, a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, and allows for flexible goal specifications, such as functions over entities and relations. Expand
Modular Multitask Reinforcement Learning with Policy Sketches
TLDR
Experiments show that using the approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks. Expand
Improving Stability in Deep Reinforcement Learning with Weight Averaging
Deep reinforcement learning (RL) methods are notoriously unstable during training. In this paper, we focus on model-free RL algorithms where we observe that the average reward is unstable throughoutExpand
Strategic Attentive Writer for Learning Macro-Actions
TLDR
A novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting, which is at the same time a general algorithm that can be applied on any sequence data. Expand
Logically-Constrained Neural Fitted Q-Iteration
We propose a method for efficient training of Q-functions for continuous-state Markov Decision Processes (MDPs) such that the traces of the resulting policies satisfy a given Linear Temporal LogicExpand
Temporal abstraction in reinforcement learning
TLDR
A general framework for prediction, control and learning at multiple temporal scales, and the way in which multi-time models can be used to produce plans of behavior very quickly, using classical dynamic programming or reinforcement learning techniques is developed. Expand
Playing Atari with Deep Reinforcement Learning
TLDR
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand
...
1
2
3
4
5
...