SAMBA: Safe Model-Based & Active Reinforcement Learning

  title={SAMBA: Safe Model-Based \& Active Reinforcement Learning},
  author={Alexander Imani Cowen-Rivers and Daniel Palenicek and Vincent Moens and Mohammed Abdullah and Aivar Sootla and Jun Wang and Haitham Ammar},
In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and… 
Risk Sensitive Model-Based Reinforcement Learning using Uncertainty Guided Planning
Uncertainty guided cross-entropy method planning is proposed, which penalises action sequences that result in high variance state predictions during model rollouts, guiding the agent to known areas of the state space with low uncertainty.
Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention
MBHI is proposed, a novel framework for safe model-based reinforcement learning, which ensures safety in the state-level and can effectively avoid both ” local” and ”non-local” catastrophes.
Learning Robust Controllers Via Probabilistic Model-Based Policy Search
It is shown that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers.
DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention
This paper takes the first step in introducing a generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extend that can be tolerated by safe policies, using a new two-player framework for safe RL called DESTA.
Robot Reinforcement Learning on the Constraint Manifold
Reinforcement Learning in robotics is extremely challenging, as these 1 tasks raise many practical issues, which are normally not considered in the Ma2 chine Learning literature. One of the most
Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles
This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints, and constructs a novel barrier-based control policy structure that can guarantee control safety.
Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?
This paper highlights the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from the 2020 NeurIPS competition on Black-Box Optimisation for Machine Learning.
HEBO: Heteroscedastic Evolutionary Bayesian Optimisation
This work presents non-conventional modifications to the surrogate model and acquisition maximisation process and shows such a combination superior against all baselines provided by the \texttt{Bayesmark} package.
Conservative Safety Critics for Exploration
This paper theoretically characterize the tradeoff between safety and policy improvement, show that the safety constraints are likely to be satisfied with high probability during training, derive provable convergence guarantees for the approach, and demonstrate the efficacy of the proposed approach on a suite of challenging navigation, manipulation, and locomotion tasks.
An Empirical Study of Assumptions in Bayesian Optimisation
The majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multi-objective acquisition ensembles with Pareto-front solutions significantly improve queried configurations, and robust acquisition maximisation affords empirical advantages relative to its nonrobust counterparts are concluded.


PILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.
Model-Based Active Exploration
This paper introduces an efficient active exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events and shows empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines.
Safe Exploration in Continuous Action Spaces
This work addresses the problem of deploying a reinforcement learning agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated, and directly adds to the policy a safety layer that analytically solves an action correction formulation per each state.
Ready Policy One: World Building Through Active Learning
This paper introduces Ready Policy One (RP1), a framework that views MBRL as an active learning problem, where it aims to improve the world model in the fewest samples possible, by utilizing a hybrid objective function which crucially adapts during optimization.
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
This work proposes a model-based RL framework based on probabilistic Model Predictive Control based on Gaussian Processes to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors and provides theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long- term planning.
Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes
This work presents a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process (MDP), which prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward.
Learning-Based Model Predictive Control for Safe Exploration
This paper presents a learning-based model predictive control scheme that can provide provable high-probability safety guarantees and exploits regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories.
A Lyapunov-based Approach to Safe Reinforcement Learning
This work defines and presents a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints.
Benchmarking Safe Exploration in Deep Reinforcement Learning
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies by trial and error. In many environments, safety is a critical concern and certain errors are
Safe Model-based Reinforcement Learning with Stability Guarantees
This paper presents a learning algorithm that explicitly considers safety, defined in terms of stability guarantees, and extends control-theoretic results on Lyapunov stability verification and shows how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates.