• Corpus ID: 208268061

Safe Reinforcement Learning via Probabilistic Shields

@article{Jansen2018SafeRL,
  title={Safe Reinforcement Learning via Probabilistic Shields},
  author={N. Jansen and Bettina Konighofer and Sebastian Junges and Alexandru Constantin Serban and Roderick Bloem},
  journal={arXiv: Artificial Intelligence},
  year={2018}
}
This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical… 

Figures and Tables from this paper

Safe Reinforcement Learning Using Probabilistic Shields

TLDR
The concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability is introduced and used to realize a shield that restricts the agent from taking unsafe actions, while optimizing the performance objective.

Adaptive Shielding under Uncertainty

TLDR
A new method is proposed for the efficient computation of a shield that is adaptive to a changing environment and independent of the controller, which may, for instance, take the form of a high-performing reinforcement learning agent.

Formal Language Constraints for Markov Decision Processes

TLDR
A general framework for augmenting a Markov decision process (MDP) with constraints that are described in formal languages over sequences of MDP states and agent actions is proposed and methods of augmenting MDP observations with the state of the constraint automaton for learning are proposed.

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

TLDR
This paper proposes an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies and adds a regularization term to encourage larger certified regions to enable better exploration.

Model-Free Learning of Safe yet Effective Controllers

TLDR
A model-free reinforcement learning algorithm is proposed that learns a policy that first maximizes the probability of ensuring safety, then the likelihood of satisfying the given LTL specification, and lastly, the sum of discounted Quality of Control rewards.

PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

TLDR
This work provides the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, it is extended to unknown CTMDP and demonstrates its practical nature by running experiments on standard benchmarks.

References

SHOWING 1-10 OF 67 REFERENCES

Safe Reinforcement Learning via Shielding

TLDR
This work proposes a new approach to learn optimal policies while enforcing properties expressed in temporal logic by synthesizing a reactive system called a shield that monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification.

Safe Exploration in Markov Decision Processes

TLDR
This paper proposes a general formulation of safety through ergodicity, and shows that imposing safety by restricting attention to the resulting set of guaranteed safe policies is NP-hard, and presents an efficient algorithm for guaranteed safe, but potentially suboptimal, exploration.

Safety-Constrained Reinforcement Learning for MDPs

TLDR
This work abstracts controller synthesis for stochastic and partially unknown environments in which safety is essential as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space.

A Lyapunov-based Approach to Safe Reinforcement Learning

TLDR
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.

Safe Exploration Techniques for Reinforcement Learning - An Overview

TLDR
This work overviews different approaches to safety in (semi)autonomous robotics and addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world).

Assured Reinforcement Learning with Formally Verified Abstract Policies

TLDR
This work presents a new reinforcement learning (RL) approach that enables an autonomous agent to solve decision making problems under constraints and validate the approach by using it to develop autonomous agents for a flag-collection navigation task and an assisted-living planning problem.

Probabilistic Policy Reuse for Safe Reinforcement Learning

This work introduces Policy Reuse for Safe Reinforcement Learning, an algorithm that combines Probabilistic Policy Reuse and teacher advice for safe exploration in dangerous and continuous state and

Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art

TLDR
A comprehensive empirical analysis of algorithms for MaxProb probabilistic planning clarifies the state of the art, characterizes the behavior of a wide range of heuristic search algorithms, and demonstrates significant benefits of several new algorithm variants.

A comprehensive survey on safe reinforcement learning

TLDR
This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric.

Safe Model-based Reinforcement Learning with Stability Guarantees

TLDR
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.
...