Corpus ID: 62439933

Safe Reinforcement Learning

@inproceedings{Thomas2015SafeRL,
  title={Safe Reinforcement Learning},
  author={P. S. Thomas},
  year={2015}
}
SAFE REINFORCEMENT LEARNING 

Topics from this paper

Data efficient reinforcement learning with off-policy and simulated data
viii
Reinforcement learning for personalization: A systematic literature review
TLDR
This compressed contribution presents a survey into reinforcement learning (RL) for personalization and its applications in the rapidly changing environment. Expand
Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
TLDR
This letter demonstrates how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms and shows that hindsight leads to a remarkable increase in sample efficiency. Expand
Hindsight policy gradients
TLDR
This paper shows how hindsight can be introduced to likelihood-ratio policy gradient methods, generalizing this capacity to an entire class of highly successful algorithms. Expand
Some Recent Applications of Reinforcement Learning
Five relatively recent applications of reinforcement learning methods are described. These examples were chosen to illustrate a diversity of application types, the engineering needed to buildExpand
POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
TLDR
A new optimization objective is introduced that produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. Expand
Dead-ends and Secure Exploration in Reinforcement Learning
TLDR
This work proposes a condition for exploration, called security, and introduces secure random-walk, a theory that is used to cap “any” given exploration policy and is guaranteed to make it secure. Expand
Safe Policy Improvement with Baseline Bootstrapping
TLDR
This paper adopts the safe policy improvement (SPI) approach, inspired by the knows-what-it-knows paradigms, and develops two computationally efficient bootstrapping algorithms, a value-based and a policy-based, both accompanied with theoretical SPI bounds. Expand
An Exploration Strategy for RL with Considerations of Budget and Risk
TLDR
A new strategy to incorporate the agent’s risk profile as an input to the learning framework by using reward shaping is described and it is shown that the reward shaping process is able to guide the agent to learn a less risky policy. Expand
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
TLDR
A new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy, based on an extension of the doubly robust estimator and a new way to mix between model based estimates and importance sampling based estimates. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 82 REFERENCES
Internal Rewards Mitigate Agent Boundedness
TLDR
This work extends agent design to include the meta-optimization problem of selecting internal agent goals (rewards) which optimize the designer's goals, and empirically demonstrate several instances of common agent bounds being mitigated by general internal reward functions. Expand
PAC-inspired Option Discovery in Lifelong Reinforcement Learning
TLDR
This work provides the first formal analysis of the sample complexity, a measure of learning speed, of reinforcement learning with options, and inspires a novel option-discovery algorithm that aims at minimizing overall sample complexity in lifelong reinforcement learning. Expand
Agent Based Decision Support System Using Reinforcement Learning Under Emergency Circumstances
TLDR
A novel interpretation of Markov decision process is designed providing clear mathematical formulation to connect reinforcement learning as well as to express integrated agent system for patient's right diagnosis and treatment under emergency circumstance. Expand
Robust reinforcement learning control with static and dynamic stability
Robust control theory is used to design stable controllers in the presence of uncertainties. This provides powerful closed-loop robustness guarantees, but can result in controllers that areExpand
Lyapunov Design for Safe Reinforcement Learning
TLDR
This work proposes a method for constructing safe, reliable reinforcement learning agents based on Lyapunov design principles that ensures qualitatively satisfactory agent behavior for virtually any reinforcement learning algorithm and at all times, including while the agent is learning and taking exploratory actions. Expand
PAC model-free reinforcement learning
TLDR
This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience, and Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms. Expand
Reinforcement Learning: An Introduction
TLDR
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications. Expand
Genetic Programming for Reward Function Search
TLDR
This paper presents a genetic programming algorithm to search for alternate reward functions that improve agent learning performance, and presents experiments that show the superiority of these reward functions, and demonstrates the possible scalability of the method. Expand
Reachability-based safe learning with Gaussian processes
TLDR
This work proposes a novel method that uses a principled approach to learn the system's unknown dynamics based on a Gaussian process model and iteratively approximates the maximal safe set and further incorporates safety into the reinforcement learning performance metric, allowing a better integration of safety and learning. Expand
Reinforcement Learning in Finite MDPs: PAC Analysis
TLDR
The current state-of-the-art for near-optimal behavior in finite Markov Decision Processes with a polynomial number of samples is summarized by presenting bounds for the problem in a unified theoretical framework. Expand
...
1
2
3
4
5
...