# Model-based Reinforcement Learning from Signal Temporal Logic Specifications

@article{Kapoor2020ModelbasedRL, title={Model-based Reinforcement Learning from Signal Temporal Logic Specifications}, author={Parv Kapoor and Anand Balakrishnan and Jyotirmoy V. Deshmukh}, journal={ArXiv}, year={2020}, volume={abs/2011.04950} }

Techniques based on Reinforcement Learning (RL) are increasingly being used to design control policies for robotic systems. RL fundamentally relies on state-based reward functions to encode desired behavior of the robot and bad reward functions are prone to exploitation by the learning agent, leading to behavior that is undesirable in the best case and critically dangerous in the worst. On the other hand, designing good reward functions for complex tasks is a challenging problem. In this paper…

## 6 Citations

Policy Synthesis for Metric Interval Temporal Logic with Probabilistic Distributions

- Computer ScienceArXiv
- 2021

A procedure to translate a specification into a stochastic timed automaton and an approximate-optimal probabilistic planning problem for synthesizing the control policy that maximizes the probability for the planning agent to achieve the task, provided that the external events satisfy the specification.

Vehicle Trajectory Prediction Using Generative Adversarial Network With Temporal Logic Syntax Tree Features

- Computer ScienceIEEE Robotics and Automation Letters
- 2021

A framework based on generative adversarial networks that uses tools from formal methods, namely signal temporal logic and syntax trees, allows us to leverage information on rule obedience as features in neural networks and improves prediction accuracy without biasing towards lawful behavior.

From English to Signal Temporal Logic

- Computer ScienceArXiv
- 2021

DeepSTL, a tool and technique for the translation of informal requirements, given as free English sentences, into Signal Temporal Logic (STL), a formal specification language for cyber-physical systems, used both by academia and advanced research labs in industry is proposed.

Deep Reinforcement Learning Based Networked Control with Network Delays for Signal Temporal Logic Specifications

- Computer ScienceArXiv
- 2021

This work proposes an extended Markov decision process (MDP) using past system states and control actions, called a τd - MDP, so that the agent can evaluate the satisfaction of the STL formula considering the network delays.

DeepSTL -- From English Requirements to Signal Temporal Logic

- Computer Science
- 2021

DeepSTL, a tool and technique for the translation of informal requirements, given as free English sentences, into Signal Temporal Logic (STL), a formal specification language for cyber-physical systems, used both by academia and advanced research labs in industry is proposed.

Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks

- Computer ScienceArXiv
- 2021

This work proposes a policy search approach to learn controllers from speciﬁcations given as Signal Temporal Logic (STL) formulae, and uses control barrier functions (CBFs) with the learned model to improve the safety of the system.

## References

SHOWING 1-10 OF 66 REFERENCES

Structured Reward Shaping using Signal Temporal Logic specifications

- Computer Science2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2019

This paper proposes the use of the logical formalism of Signal Temporal Logic (STL) as a formal specification for the desired behaviors of the agent and proposes algorithms to locally shape rewards in each state with the goal of satisfying the high-level STL specification.

Reinforcement learning with temporal logic rewards

- Computer Science2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2017

It is shown in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied.

Gaussian Processes for Data-Efficient Learning in Robotics and Control

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015

This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.

Logically-Constrained Reinforcement Learning

- Computer Science
- 2018

It is proved that the first model-free Reinforcement Learning (RL) algorithm to synthesise policies for an unknown Markov Decision Process (MDP), such that a linear time property is satisfied, is guaranteed to find a policy whose traces probabilistically satisfy the LTL property if such a policy exists.

Safe Model-based Reinforcement Learning with Stability Guarantees

- Computer ScienceNIPS
- 2017

This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.

Continuous Deep Q-Learning with Model-based Acceleration

- Computer ScienceICML
- 2016

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.

Efficient memory-based learning for robot control

- Computer Science
- 1990

A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered, thus permitting very quick predictions of the e ects of proposed actions and, given a goal behaviour, permitting fast generation of a candidate action.

Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation

- Computer ScienceIEEE Transactions on Robotics
- 2019

A safe learning framework that employs an adaptive model learning algorithm together with barrier certificates for systems with possibly nonstationary agent dynamics, and solutions to the barrier-certified policy optimization are guaranteed to be globally optimal, ensuring the greedy policy improvement under mild conditions.

Continuous control with deep reinforcement learning

- Computer ScienceICLR
- 2016

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

- Computer ScienceNeurIPS
- 2018

This paper proposes a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation, which matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples.