A comprehensive survey on safe reinforcement learning

@article{Garca2015ACS,
  title={A comprehensive survey on safe reinforcement learning},
  author={Javier Garc{\'i}a and Fernando Fern{\'a}ndez},
  journal={J. Mach. Learn. Res.},
  year={2015},
  volume={16},
  pages={1437-1480}
}
Safe Reinforcement Learning can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes. We categorize and analyze two approaches of Safe Reinforcement Learning. The first is based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor. The… 

Figures and Tables from this paper

Safe Reinforcement Learning in Constrained Markov Decision Processes

TLDR
This paper proposes an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints and takes a stepwise approach for optimizing safety and cumulative reward.

Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

Safe Distributional Reinforcement Learning

TLDR
This paper formalizes safety in reinforcement learning with a constrained RL formulation in the distributional RL setting and empirically validate its propositions on artificial and real domains against appropriate state-of-the-art safe RL algorithms.

Safe Reinforcement Learning Using Robust Action Governor

TLDR
A framework for safe RL that is based on integration of a RL algorithm with an add-on safety supervision module, called the Robust Action Governor (RAG), which exploits set-theoretic techniques and online optimization to manage safety-related requirements during learning is introduced.

Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions

TLDR
This paper combines temporal logic with control Lyapunov functions to improve exploration and develops a flexible and learnable system that allows users to specify task objectives and constraints in different forms and at various levels.

Safe Reinforcement Learning with Stability & Safety Guarantees Using Robust MPC

TLDR
A formal theory detailing how safety and stability can be enforced through the parameter updates delivered by the Reinforcement Learning tools is still lacking and is developed for the generic robust MPC case.

What Is Acceptably Safe for Reinforcement Learning?

TLDR
A high-level argument is proposed that could be used as the basis of a safety case for Reinforcement Learning systems, where the selection of ‘reward’ and ‘cost’ mechanisms would have a critical effect on the outcome of decisions made.

Cautious Reinforcement Learning with Logical Constraints

This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Policies

Safe Reinforcement Learning Applications

TLDR
This research presents a viable novel safety metric as well as an alternative bounding model that can be used with it for applications of RL where safety is important.

Safe Reinforcement Learning Using Robust Control Barrier Functions

TLDR
This paper frames safety as a differentiable robust-control-barrier-function layer in a model- based RL framework and proposes an approach to modularly learn the underlying reward-driven task, independent of safety constraints.
...

References

SHOWING 1-10 OF 124 REFERENCES

Safe exploration for reinforcement learning

TLDR
This paper presents a level-based exploration scheme that is able to generate a comprehensive base of observations while adhering safety constraints and introduces the concepts of a safety function for determining a state’s safety degree and that of a backup policy to lead the system under control from a critical state back to a safe one.

Safe reinforcement learning in high-risk tasks through policy improvement

TLDR
This paper defines the concept of risk and addresses the problem of safe exploration in the context of RL, and introduces an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment.

Smart exploration in reinforcement learning using absolute temporal difference errors

TLDR
This work proposes a new directed exploration method, based on a notion of state controlability, which scales linearly with the number of state features, and is directly applicable to function approximation.

Safe Exploration of State and Action Spaces in Reinforcement Learning

TLDR
The PI-SRL algorithm is introduced, which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment.

Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

TLDR
A model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies based on weighting the original value function and the risk, which was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column.

Integrating Guidance into Relational Reinforcement Learning

TLDR
This paper presents a solution based on the use of “reasonable policies” to provide guidance in Relational reinforcement learning, which makes Q-learning feasible in structural domains by incorporating a relational learner into Q- learning.

Consideration of Risk in Reinforcement Learning

Near-Optimal Reinforcement Learning in Polynomial Time

TLDR
New algorithms for reinforcement learning are presented and it is proved that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes.

Risk-sensitive reinforcement learning algorithms with generalized average criterion

TLDR
The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically, by using a generalized average operator instead of the general optimal operator max to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view.

Practical reinforcement learning using representation learning and safe exploration for large scale Markov decision processes

While creating intelligent agents who can solve stochastic sequential decision making problems through interacting with the environment is the promise of Reinforcement Learning (RL), scaling existing
...