• Corpus ID: 219969400

Safe Reinforcement Learning via Curriculum Induction

@article{Turchetta2020SafeRL,
  title={Safe Reinforcement Learning via Curriculum Induction},
  author={Matteo Turchetta and Andrey Kolobov and S. Shah and Andreas Krause and Alekh Agarwal},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.12136}
}
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. In such settings, the agent needs to behave safely not only after but also while learning. To achieve this, existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations during exploration with high probability, but both the probabilistic guarantees and the smoothness assumptions inherent in the priors are not viable in many… 

Figures and Tables from this paper

FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimizer
TLDR
To the best of the knowledge, this is the first DNN-based optimizer for constrained optimization with the forward invariance guarantee, and it is shown that the optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically.
Safe Reinforcement Learning Using Advantage-Based Intervention
TLDR
This work proposes a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent’s policy using off-the-shelf RL algorithms designed for unconstrained MDPs.
DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention
TLDR
This paper takes the first step in introducing a generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extend that can be tolerated by safe policies, using a new two-player framework for safe RL called DESTA.
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
TLDR
This survey seeks to unify the field of AutoRL, provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.
Constrained Policy Optimization via Bayesian World Models
TLDR
This work proposes LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes, and utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper limits on the safety constraints.
Curriculum Learning: A Survey
TLDR
This survey shows how limits have been tackled in the literature, and presents curriculum learning instantiations for various tasks in machine learning, and constructs a multi-perspective clustering algorithm, linking the discovered clusters with the taxonomy.
Do Androids Dream of Electric Fences? Safety-Aware Reinforcement Learning with Latent Shielding
TLDR
This work presents a novel approach to safetyaware deep reinforcement learning in high-dimensional environments called latent shielding, which leverages internal representations of the environment learnt by modelbased agents to “imagine” future trajectories and avoid those deemed unsafe.
SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition
TLDR
This work theoretically characterize why SAFER can enforce safe policy learning and demonstrate its effectiveness on several complex safety-critical robotic grasping tasks inspired by the game Operation, in which SAFER outperforms baseline methods in learning successful policies and enforcing safety.
SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation
TLDR
This work shows that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance, and argues thatSaute MDP allows to view Safe RL problem from a different perspective enabling new features.
SCOPE: Safe Exploration for Dynamic Computer Systems Optimization
TLDR
This work evaluates SCOPE’s ability to deliver improved latency while minimizing power constraint violations by dynamically configuring hardware while running a variety of Apache Spark applications.
...
1
2
3
...

References

SHOWING 1-10 OF 59 REFERENCES
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
TLDR
This work proposes an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt.
Benchmarking Safe Exploration in Deep Reinforcement Learning
TLDR
This work proposes to standardize constrained RL as the main formalism for safe exploration, and presents the Safety Gym benchmark suite, a new slate of high-dimensional continuous control environments for measuring research progress on constrained RL.
Safe Reinforcement Learning via Shielding
TLDR
A new approach to learn optimal policies while enforcing properties expressed in temporal logic by synthesizing a reactive system called a shield, which monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification.
Automatic Goal Generation for Reinforcement Learning Agents
TLDR
This work uses a generator network to propose tasks for the agent to try to achieve, specified as goal states, and shows that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment.
A Lyapunov-based Approach to Safe Reinforcement Learning
TLDR
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.
Reverse Curriculum Generation for Reinforcement Learning
TLDR
This work proposes a method to learn goal-oriented tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved, and generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal- oriented tasks.
Reachability-based safe learning with Gaussian processes
TLDR
This work proposes a novel method that uses a principled approach to learn the system's unknown dynamics based on a Gaussian process model and iteratively approximates the maximal safe set and further incorporates safety into the reinforcement learning performance metric, allowing a better integration of safety and learning.
Safe Model-based Reinforcement Learning with Stability Guarantees
TLDR
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
TLDR
A novel algorithm is developed and proved that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint, and is demonstrated on digital terrain models for the task of exploring an unknown map with a rover.
Learning-Based Model Predictive Control for Safe Exploration
TLDR
This paper presents a learning-based model predictive control scheme that can provide provable high-probability safety guarantees and exploits regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories.
...
1
2
3
4
5
...