• Corpus ID: 235458398

Safe Reinforcement Learning Using Advantage-Based Intervention

@inproceedings{Wagener2021SafeRL,
  title={Safe Reinforcement Learning Using Advantage-Based Intervention},
  author={Nolan Wagener and Byron Boots and Ching-An Cheng},
  booktitle={ICML},
  year={2021}
}
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this… 

Figures from this paper

Constrained Variational Policy Optimization for Safe Reinforcement Learning

TLDR
A novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning that achieves signifi-cantly better constraint satisfaction performance and better sample efficiency than baselines.

Safe Reinforcement Learning with Chance-constrained Model Predictive Control

TLDR
This work coupling a safety guide based on model predictive control with a modified policy gradient framework in a linear setting with continuous actions addresses the challenge of safe RL by coupling safety requirements as chance constraints in the MPC formulation.

Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

TLDR
By studying multiple RTA approaches in both on-policy and o-policy RL algorithms, this work seeks to understand which RTA methods are most e-ective, whether the agents become dependent on the RTA, and the importance of reward shaping versus safe exploration in RL agent training.

A Review of Safe Reinforcement Learning: Methods, Theory and Applications

TLDR
A review of the progress of safe RL from the perspectives of methods, theory and applications, and problems that are crucial for safe RL being deployed in real-world applications, coined as “2H3W” are reviewed.

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

TLDR
This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints, and constructs a novel barrier-based control policy structure that can guarantee control safety.

When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning

TLDR
This work proposes an algorithm that can efficiently learns to detect and avoid states that are irreversible, and proactively ask for help in case the agent does enter them, and exhibits both better sample- and intervention-efficiency compared to existing methods.

LazyDAgger: Reducing Context Switching in Interactive Imitation Learning

TLDR
LazyDAgger is presented, which extends the interactive imitation learning (IL) algorithm SafeDAgger to reduce context switches between supervisor and autonomous control and improves the performance and robustness of the learned policy during both learning and execution while limiting burden on the supervisor.

Safe Reinforcement Learning Using Black-Box Reachability Analysis

TLDR
A Black-box Reachability-based Safety Layer (BRSL) with three main components: a data-driven reachability analysis for a black-box robot model, a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions.

References

SHOWING 1-10 OF 44 REFERENCES

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

TLDR
An Optimistic-Dual Proximal Policy-OPDOP algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration, which is the first provably efficient policy optimization algorithm for CMDPs with safe exploration.

A Lyapunov-based Approach to Safe Reinforcement Learning

TLDR
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.

Conservative Safety Critics for Exploration

TLDR
This paper theoretically characterize the tradeoff between safety and policy improvement, show that the safety constraints are likely to be satisfied with high probability during training, derive provable convergence guarantees for the approach, and demonstrate the efficacy of the proposed approach on a suite of challenging navigation, manipulation, and locomotion tasks.

Safe reinforcement learning in high-risk tasks through policy improvement

TLDR
This paper defines the concept of risk and addresses the problem of safe exploration in the context of RL, and introduces an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment.

Constrained Policy Optimization

TLDR
Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training.

Lyapunov-based Safe Policy Optimization for Continuous Control

TLDR
Safe policy optimization algorithms based on a Lyapunov approach to solve continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations are presented.

Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones

TLDR
This work proposes Recovery RL, an algorithm which navigates this tradeoff by leveraging offline data to learn about constraint violating zones before policy learning and separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely.

Safe Exploration in Continuous Action Spaces

TLDR
This work addresses the problem of deploying a reinforcement learning agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated, and directly adds to the policy a safety layer that analytically solves an action correction formulation per each state.

Learning to be Safe: Deep RL with a Safety Critic

TLDR
This work proposes to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors when learning new, modified tasks, and empirically studies this form of safety-constrained transfer learning in three challenging domains.

Safe Model-based Reinforcement Learning with Stability Guarantees

TLDR
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.