Safer Reinforcement Learning through Transferable Instinct Networks

  title={Safer Reinforcement Learning through Transferable Instinct Networks},
  author={Djordje Grbic and Sebastian Risi},
Random exploration is one of the main mechanisms through which reinforcement learning (RL) finds well-performing policies. However, it can lead to undesirable or catastrophic outcomes when learning online in safety-critical environments. In fact, safe learning is one of the major obstacles towards real-world agents that can learn during deployment. One way of ensuring that agents respect hard limitations is to explicitly configure boundaries in which they can operate. While this might work in… 
1 Citations

Figures from this paper

Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation

This paper presents the first model-free, simulatorfree reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation and Triple-Q, which is similar to SARSA for unconstrained MDPs, and is computationally efficient.



Benchmarking Safe Exploration in Deep Reinforcement Learning

This work proposes to standardize constrained RL as the main formalism for safe exploration, and presents the Safety Gym benchmark suite, a new slate of high-dimensional continuous control environments for measuring research progress on constrained RL.

Learning to be Safe: Deep RL with a Safety Critic

This work proposes to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors when learning new, modified tasks, and empirically studies this form of safety-constrained transfer learning in three challenging domains.

Conservative Safety Critics for Exploration

This paper theoretically characterize the tradeoff between safety and policy improvement, show that the safety constraints are likely to be satisfied with high probability during training, derive provable convergence guarantees for the approach, and demonstrate the efficacy of the proposed approach on a suite of challenging navigation, manipulation, and locomotion tasks.

Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation

A novel approach to incorporate estimations of safety to guide exploration and policy search in deep reinforcement learning is proposed by using a cost function to capture trajectory-based safety and formulate the state-action value function of this safety cost as a candidate Lyapunov function and extend control-theoretic results to approximate its derivative using online Gaussian Process estimation.

Safe Exploration Techniques for Reinforcement Learning - An Overview

This work overviews different approaches to safety in (semi)autonomous robotics and addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world).

A comprehensive survey on safe reinforcement learning

This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric.

SafeLife 1.0: Exploring Side Effects in Complex Environments

We present SafeLife, a publicly available reinforcement learning environment that tests the safety of reinforcement learning agents. It contains complex, dynamic, tunable, procedurally generated

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

This work introduces several reinforcement learning tasks with multiple commercially available robots that present varying levels of learning difficulty, setup, and repeatability and test the learning performance of off-the-shelf implementations of four reinforcement learning algorithms and analyzes sensitivity to their hyper-parameters to determine their readiness for applications in various real-world tasks.

Grandmaster level in StarCraft II using multi-agent reinforcement learning

The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

Modular Deep Reinforcement Learning with Temporal Logic Specifications

We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but