Corpus ID: 208617508

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

  title={Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation},
  author={Samuel Ainsworth and Matt Barnes and Siddhartha S. Srinivasa},
In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops… Expand
Abstraction-Guided Policy Recovery from Expert Demonstrations
This work presentsion-Guided Policy Recovery from Expert Demonstrations, a meta-modelling framework for guiding policy recovery from expertdemonstrations in the context of knowledge-based decision-making. Expand
MOReL : Model-Based Offline Reinforcement Learning
Theoretically, it is shown that MOReL is minimax optimal (up to log factors) for offline RL, and through experiments, it matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Expand


Safe Exploration of State and Action Spaces in Reinforcement Learning
The PI-SRL algorithm is introduced, which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. Expand
Practical Reinforcement Learning in Continuous Spaces
This paper introduces an algorithm that safely approximates the value function for continuous state control tasks, and that learns quickly from a small amount of data, and gives experimental results using this algorithm to learn policies for both a simulated task and also for a real robot, operating in an unaltered environment. Expand
Exploration and apprenticeship learning in reinforcement learning
This paper considers the apprenticeship learning setting in which a teacher demonstration of the task is available, and shows that, given the initial demonstration, no explicit exploration is necessary, and the student can attain near-optimal performance simply by repeatedly executing "exploitation policies" that try to maximize rewards. Expand
Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
This work proposes an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. Expand
A comprehensive survey on safe reinforcement learning
This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric. Expand
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
This paper proposes a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting and demonstrates that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem. Expand
Deep Q-learning From Demonstrations
This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstrating data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism. Expand
Variance Reduction Methods for Sublinear Reinforcement Learning
This work considers the problem of provably optimal reinforcement learning for (episodic) finite horizon MDPs, i.e. how an agent learns to maximize his/her (long term) reward in an uncertainExpand
SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces
The SHIV algorithm (Svm-based reduction in Human InterVention), which converges to a single policy and reduces supervisor burden in non-stationary high dimensional state distributions, is introduced. Expand
Apprenticeship learning via inverse reinforcement learning
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function. Expand