Safe Distributional Reinforcement Learning

  title={Safe Distributional Reinforcement Learning},
  author={Jianyi Zhang and P. Weng},
  booktitle={International Conference on Distributed Artificial Intelligence},
  • Jianyi ZhangP. Weng
  • Published in
    International Conference on…
    26 February 2021
  • Computer Science
Safety in reinforcement learning (RL) is a key property in both training and execution in many domains such as autonomous driving or finance. In this paper, we formalize it with a constrained RL formulation in the distributional RL setting. Our general model accepts various definitions of safety (e.g., bounds on expected performance, CVaR, variance, or probability of reaching bad states). To ensure safety during learning, we extend a safe policy optimization method to solve our problem. The… 

Robust Reinforcement Learning with Distributional Risk-averse formulation

This paper approximate the Robust Reinforcement Learning constrained with a Φ -divergence using an approximate Risk-Averse formulation and shows that the classical Reinforcement learning formulation can be robustified using standard deviation penalization of the objective.

Risk-Averse Zero-Order Trajectory Optimization

A simple but effective method for managing risk in zero1 order trajectory optimization that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism of an ensemble of stochastic neural networks is introduced.

ACReL: Adversarial Conditional value-at-risk Reinforcement Learning

as a Stackelberg game, enabling the use of deep RL architectures and training algorithms. Empirical experiments show that ACReL matches a CVaR RL state-of-the-art baseline for retrieving CVaR optimal

A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning

The proposed approach is a twoplayer zero-sum game between a policy player and an adversary that perturbs the policy player’s state transitions given a finite budget, and it is shown that, the closer the players are to the game”s equilibrium point, the close the learned policy is to the CVaR-optimal one with a risk tolerance explicitly related to the adversary“s budget.



Safe Reinforcement Learning via Shielding

This work proposes a new approach to learn optimal policies while enforcing properties expressed in temporal logic by synthesizing a reactive system called a shield that monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification.

Safe Model-based Reinforcement Learning with Stability Guarantees

This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.

Benchmarking Safe Exploration in Deep Reinforcement Learning

This work proposes to standardize constrained RL as the main formalism for safe exploration, and presents the Safety Gym benchmark suite, a new slate of high-dimensional continuous control environments for measuring research progress on constrained RL.

Reinforcement Learning with Convex Constraints

This paper proposes an algorithmic scheme that can handle any constraints that require expected values of some vector measurements to lie in a convex set, and matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms do not incorporate, such as diversity.

A comprehensive survey on safe reinforcement learning

This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric.

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

This work proposes a controller architecture that combines a model-free RL-based controller with model-based controllers utilizing control barrier functions (CBFs) and on-line learning of the unknown system dynamics, in order to ensure safety during learning.

Reward Constrained Policy Optimization

This work presents a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one.

Constrained Policy Optimization

Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training.

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

A novel algorithm is developed and proved that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint, and is demonstrated on digital terrain models for the task of exploring an unknown map with a rover.

Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning

It is proved that the approach toward incorporating knowledge about safe control into learning systems preserves safety guarantees, and it is demonstrated that the empirical performance benefits provided by reinforcement learning are retained.