• Corpus ID: 8090283

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

@article{Saunders2017TrialWE,
  title={Trial without Error: Towards Safe Reinforcement Learning via Human Intervention},
  author={William Saunders and Girish Sastry and Andreas Stuhlm{\"u}ller and Owain Evans},
  journal={ArXiv},
  year={2017},
  volume={abs/1707.05173}
}
AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We… 

Figures from this paper

Benchmarking Safe Exploration in Deep Reinforcement Learning

This work proposes to standardize constrained RL as the main formalism for safe exploration, and presents the Safety Gym benchmark suite, a new slate of high-dimensional continuous control environments for measuring research progress on constrained RL.

Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization

A novel human-in-the-loop learning method called Human-AI Copilot Optimization (HACO), which extracts proxy state-action values from partial human demonstration and optimizes the agent to improve the proxy values while reducing the human interventions.

AI Safe Exploration: Reinforced learning with a blocker in unsafe environments

The study includes implementation of an artifact meant to replace the human overseer when training an AI in simulated unsafe environments, and the results of testing the implemented blocker shows that it can be used for avoiding catastrophes and finding a path to reach the goal in 17 out of 18 runs.

Generalizing from a few environments in safety-critical reinforcement learning

It is shown that catastrophes can be significantly reduced with simple modifications, including ensemble model averaging and the use of a blocking classifier, and that the uncertainty information from the ensemble is useful for predicting whether a catastrophe will occur within a few steps and hence whether human intervention should be requested.

Don't do it: Safer Reinforcement Learning With Rule-based Guidance

A new safe epsilon-greedy algorithm is proposed that uses safety rules to override agents’ actions if they are con- sidered to be unsafe and achieves better perfor- mance than the base model.

Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems

Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms.

An online framework for robot learning from explicit and implicit human feedback

This work argues that learning interactively from expert interventions enjoys the best of both worlds, and formalizes this as a constraint on the learner’s value function, which it can efficiently learn using no regret, online learning techniques.

Expert Intervention Learning

This work argues that learning interactively from expert interventions enjoys the best of both worlds, and formalizes this as a constraint on the learner’s value function, which it can efficiently learn using no regret, online learning techniques.

Model-Free Reinforcement Learning for Real-World Robots

This thesis considers real-world tasks that may benefit from Reinforcement Learning, and for which there is no simulator available, and introduces the model-free actor-critic Bootstrapped Dual Policy Iteration algorithm (BDPI), a formalism that allows an agent to learn partially-observable tasks in a sample-efficient way.

Provable Safe Reinforcement Learning with Binary Feedback

A novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting, and is guaranteed to return a near-optimal safe policy with high probability.
...

References

SHOWING 1-10 OF 35 REFERENCES

Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

This paper exploits a structure to learn a reward-shaping that accelerates learning and guards oscillating policies against repeated catastrophes, and introduces intrinsic fear, a new method that mitigates these problems by avoiding dangerous states.

Combating Deep Reinforcement Learning ’ s Sisyphean Curse with Reinforcement Learning

A reward shaping that accelerates learning and guards oscillating policies against repeated catastrophes is learned and intrinsic fear is introduced, a new method that mitigates these problems by avoiding dangerous states.

Deep Reinforcement Learning from Human Preferences

This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion.

Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision

This work implemented the last four elements of the scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa.

Deep Learning of Robotic Tasks using Strong and Weak Human Supervision

This work implemented the last four elements of the scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Concrete Problems in AI Safety

A list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function, an objective function that is too expensive to evaluate frequently, or undesirable behavior during the learning process, are presented.

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

This work explores protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning, to incorporate the beneficial properties of a human teacher into Reinforcement learning without making strong assumptions about the inner workings of the agent.

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

This paper investigates settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties.

Model-Free Episodic Control

This work demonstrates that a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks and attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.