• Corpus ID: 14849340

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

@inproceedings{Turchetta2016SafeEI,
  title={Safe Exploration in Finite Markov Decision Processes with Gaussian Processes},
  author={Matteo Turchetta and Felix Berkenkamp and Andreas Krause},
  booktitle={NIPS},
  year={2016}
}
In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this… 

Figures from this paper

Safe Exploration in Markov Decision Processes with Time-Variant Safety using Spatio-Temporal Gaussian Process
TLDR
A learning algorithm for exploring Markov decision processes (MDPs) that is based on the assumption that the safety features are a priori unknown and time-variant, and maximizing the cumulative number of safe states in the worst case scenario with respect to future observations is presented.
Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes
TLDR
This work presents a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process (MDP), which prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward.
Safe Exploration for Identifying Linear Systems via Robust Optimization
TLDR
This work studies how one can safely identify the parameters of a system model with a desired accuracy and confidence level, and shows how to compute safe regions of action space by gradually growing a ball around the nominal safe action.
Safe Reinforcement Learning in Constrained Markov Decision Processes
TLDR
This paper proposes an algorithm, SNO-MDP, that explores and optimizes Markov decision processes under unknown safety constraints and takes a stepwise approach for optimizing safety and cumulative reward.
Markov Decision Processes with Unknown State Feature Values for Safe Exploration using Gaussian Processes
TLDR
An exploration algorithm is proposed that, contrary to previous approaches, considers probabilistic transitions and explicitly reasons about the uncertainty over the Gaussian process predictions, and increases the speed of exploration by selecting locations to visit further away from the currently explored area.
Constrained Markov Decision Processes via Backward Value Functions
TLDR
This work model the problem of learning with constraints as a Constrained Markov Decision Process and provides a new on-policy formulation for solving it and defines a safe policy improvement method which maximizes returns while ensuring that the constraints are satisfied at every step.
Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach
TLDR
A model-free safety specification method that learns the maximal probability of safe operation by carefully combining probabilistic reachability analysis and safe reinforcement learning (RL), and regulates the exploratory policy to avoid dangerous states with high confidence.
Safe Exploration for Constrained Reinforcement Learning with Provable Guarantees
TLDR
This work proposes a model-based safe RL algorithm that is to use an optimistic exploration approach with pessimistic constraint enforcement for learning the policy, and shows that it achieves an Õ(S √ AH7K/(C̄ − C̄b)) cumulative regret without violating the safety constraints during learning.
SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation
TLDR
This work shows that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance, and argues thatSaute MDP allows to view Safe RL problem from a different perspective enabling new features.
Learning to Act Safely with Limited Exposure and Almost Sure Certainty
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Safe Exploration in Markov Decision Processes
TLDR
This paper proposes a general formulation of safety through ergodicity, and shows that imposing safety by restricting attention to the resulting set of guaranteed safe policies is NP-hard, and presents an efficient algorithm for guaranteed safe, but potentially suboptimal, exploration.
Reachability-based safe learning with Gaussian processes
TLDR
This work proposes a novel method that uses a principled approach to learn the system's unknown dynamics based on a Gaussian process model and iteratively approximates the maximal safe set and further incorporates safety into the reinforcement learning performance metric, allowing a better integration of safety and learning.
Safe Exploration of State and Action Spaces in Reinforcement Learning
TLDR
The PI-SRL algorithm is introduced, which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment.
Safe Exploration Techniques for Reinforcement Learning - An Overview
TLDR
This work overviews different approaches to safety in (semi)autonomous robotics and addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world).
Safe Exploration for Optimization with Gaussian Processes
TLDR
This work develops an efficient algorithm called SAFEOPT, and theoretically guarantees its convergence to a natural notion of optimum reachable under safety constraints, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation.
Risk-Sensitive Reinforcement Learning Applied to Control under Constraints
TLDR
A model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies based on weighting the original value function and the risk, which was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column.
Safe exploration for reinforcement learning
TLDR
This paper presents a level-based exploration scheme that is able to generate a comprehensive base of observations while adhering safety constraints and introduces the concepts of a safety function for determining a state’s safety degree and that of a backup policy to lead the system under control from a critical state back to a safe one.
Safe Exploration for Active Learning with Gaussian Processes
TLDR
This paper proposes an approach for learning data-based regression models from technical and industrial systems using a limited budget of measured and exploring new data regions based on Gaussian processes GP's, using a problem specific GP classifier to identify safe and unsafe regions, while using a differential entropy criterion for exploring relevant data regions.
Reinforcement learning in robotics: A survey
TLDR
This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.
Safe controller optimization for quadrotors with Gaussian processes
TLDR
Experimental results on a quadrotor vehicle indicate that the proposed SafeOpt algorithm enables fast, automatic, and safe optimization of controller parameters without human intervention.
...
1
2
3
...