Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

  title={Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models},
  author={Erdem Biyik and Jonathan Margoliash and Shahrouz Ryan Alimo and Dorsa Sadigh},
  journal={2019 American Control Conference (ACC)},
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. Our algorithm guarantees safety by leveraging Lipschitz-continuity to ensure that no unsafe states are visited during exploration. Unlike many other existing techniques, the provided safety guarantee is deterministic. Our algorithm is optimized to reduce the number of actions needed for exploring the safe space. We demonstrate the performance of our algorithm in comparison with… Expand
Probabilistic Safety Constraints for Learned High Relative Degree System Dynamics
Given streaming observations of the system state, Bayesian learning is used to obtain a distribution over the system dynamics and optimize the system behavior and ensure safety with high probability by specifying a chance constraint over a control barrier function. Expand
Reinforcement Learning with Quantitative Verification for Assured Multi-Agent Policies
This work introduces a new approach that combines multi-agent reinforcement learning with a formal verification technique termed quantitative verification that constrains agent behaviours in ways that ensure the satisfaction of requirements associated with the safety, reliability, and other non-functional aspects of the decision-making problem being solved. Expand
Safe Exploration for Interactive Machine Learning
A novel framework is introduced that renders any existing unsafe IML algorithm safe and works as an add-on that takes suggested decisions as input and exploits regularity assumptions in terms of a Gaussian process prior in order to efficiently learn about their safety. Expand
Delaunay-based Derivative-free Optimization via Global Surrogates with Safe and Exact Function Evaluations
A new, safety-constrained variant of Delaunay-based derivative-free optimization via Global Surrogate, dubbed S-DOGS, to automatically learn the safe region of parameter space, while simultaneously characterizing and optimizing the utility function under consideration, under the assumption that the underlying safety constraints are Lipschitz continuous and thesafe region is connected and compact. Expand
Control Barriers in Bayesian Learning of System Dynamics
This paper uses a matrix variate Gaussian process (MVGP) regression approach with efficient covariance factorization to learn the drift and input gain terms of a nonlinear control-affine system and shows that a safe control policy can be synthesized for systems with arbitrary relative degree and probabilistic CLF-CBF constraints by solving a second order cone program. Expand
Regret Analysis of Learning-Based MPC with Partially-Unknown Cost Function
This paper proposes the use of a finite-horizon oracle controller with perfect knowledge of all system parameters as a reference for optimal control actions and develops learning-based policies that achieve low regret with respect to this oracle finite-Horizon controller. Expand
Automated Verification and Control of Large-Scale Stochastic Cyber-Physical Systems: Compositional Techniques
This dissertation provides novel compositional techniques to analyze and control large-scale stochastic CPSs in an automated as well as formal fashion with three different compositional techniquesExpand
Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art
This paper reviews algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary computing, and active learning, and concludes by explaining how the algorithms are connected and suggestions for future research. Expand
Provably Safe PAC-MDP Exploration Using Analogies
This work proposes Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics, which exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Expand
Multi-Agent Safe Planning with Gaussian Processes
A novel multi-agent safe learning algorithm that enables decentralized safe navigation when there are multiple different agents in the environment and is trained in a decentralized fashion. Expand


Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
A novel algorithm is developed and proved that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint, and is demonstrated on digital terrain models for the task of exploring an unknown map with a rover. Expand
Safe Exploration in Markov Decision Processes
This paper proposes a general formulation of safety through ergodicity, and shows that imposing safety by restricting attention to the resulting set of guaranteed safe policies is NP-hard, and presents an efficient algorithm for guaranteed safe, but potentially suboptimal, exploration. Expand
Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes
This work presents a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process (MDP), which prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Expand
Reachability-based safe learning with Gaussian processes
This work proposes a novel method that uses a principled approach to learn the system's unknown dynamics based on a Gaussian process model and iteratively approximates the maximal safe set and further incorporates safety into the reinforcement learning performance metric, allowing a better integration of safety and learning. Expand
Fast Safe Mission Plans for Autonomous Vehicles
This work proposes a novel combination of sampling-based motion planning with safe control synthesis methods for generating safe high-level plans in real-time for guaranteeing safety in the real-world deployment of robots and autonomous cyber-physical systems. Expand
Safe Exploration of State and Action Spaces in Reinforcement Learning
The PI-SRL algorithm is introduced, which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. Expand
Safe Control under Uncertainty with Probabilistic Signal Temporal Logic
This work proposes the new Probabilistic Signal Temporal Logic (PrSTL), an expressive language to define stochastic properties and enforce probabilistic guarantees on them, and presents an efficient algorithm to reason about safe controllers given the constraints derived from the PrSTL specification. Expand
A comprehensive survey on safe reinforcement learning
This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric. Expand
Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes
This paper analyzes a connection between risk-sensitive and minimax criteria for discrete-time, finite-state Markov decision processes (MDPs). We synthesize optimal policies with respect to bothExpand
Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics
A generalized algorithm that allows for multiple safety constraints separate from the objective is presented and it is demonstrated that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle. Expand