• Corpus ID: 226289821

Model-based Reinforcement Learning from Signal Temporal Logic Specifications

  title={Model-based Reinforcement Learning from Signal Temporal Logic Specifications},
  author={Parv Kapoor and Anand Balakrishnan and Jyotirmoy V. Deshmukh},
Techniques based on Reinforcement Learning (RL) are increasingly being used to design control policies for robotic systems. RL fundamentally relies on state-based reward functions to encode desired behavior of the robot and bad reward functions are prone to exploitation by the learning agent, leading to behavior that is undesirable in the best case and critically dangerous in the worst. On the other hand, designing good reward functions for complex tasks is a challenging problem. In this paper… 

Figures and Tables from this paper

Policy Synthesis for Metric Interval Temporal Logic with Probabilistic Distributions
A procedure to translate a specification into a stochastic timed automaton and an approximate-optimal probabilistic planning problem for synthesizing the control policy that maximizes the probability for the planning agent to achieve the task, provided that the external events satisfy the specification.
Vehicle Trajectory Prediction Using Generative Adversarial Network With Temporal Logic Syntax Tree Features
A framework based on generative adversarial networks that uses tools from formal methods, namely signal temporal logic and syntax trees, allows us to leverage information on rule obedience as features in neural networks and improves prediction accuracy without biasing towards lawful behavior.
From English to Signal Temporal Logic
DeepSTL, a tool and technique for the translation of informal requirements, given as free English sentences, into Signal Temporal Logic (STL), a formal specification language for cyber-physical systems, used both by academia and advanced research labs in industry is proposed.
Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic Specifications
This work proposes an extended Markov decision process (MDP) using past system states and control actions, called a τd - MDP, so that the agent can evaluate the satisfaction of the STL formula considering the network delays.
DeepSTL -- From English Requirements to Signal Temporal Logic
DeepSTL, a tool and technique for the translation of informal requirements, given as free English sentences, into Signal Temporal Logic (STL), a formal specification language for cyber-physical systems, used both by academia and advanced research labs in industry is proposed.
Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks
This work proposes a policy search approach to learn controllers from specifications given as Signal Temporal Logic (STL) formulae, and uses control barrier functions (CBFs) with the learned model to improve the safety of the system.


Structured Reward Shaping using Signal Temporal Logic specifications
This paper proposes the use of the logical formalism of Signal Temporal Logic (STL) as a formal specification for the desired behaviors of the agent and proposes algorithms to locally shape rewards in each state with the goal of satisfying the high-level STL specification.
Reinforcement learning with temporal logic rewards
  • Xiao Li, C. Vasile, C. Belta
  • Computer Science
    2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2017
It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.
Gaussian Processes for Data-Efficient Learning in Robotics and Control
This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.
Logically-Constrained Reinforcement Learning
It is proved that the first model-free Reinforcement Learning (RL) algorithm to synthesise policies for an unknown Markov Decision Process (MDP), such that a linear time property is satisfied, is guaranteed to find a policy whose traces probabilistically satisfy the LTL property if such a policy exists.
Safe Model-based Reinforcement Learning with Stability Guarantees
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.
Continuous Deep Q-Learning with Model-based Acceleration
This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.
Efficient memory-based learning for robot control
A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered, thus permitting very quick predictions of the e ects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action.
Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation
A safe learning framework that employs an adaptive model learning algorithm together with barrier certificates for systems with possibly nonstationary agent dynamics, and solutions to the barrier-certified policy optimization are guaranteed to be globally optimal, ensuring the greedy policy improvement under mild conditions.
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.