Shared Autonomy via Deep Reinforcement Learning

@article{Reddy2018SharedAV,
  title={Shared Autonomy via Deep Reinforcement Learning},
  author={Siddharth Reddy and Sergey Levine and Anca D. Dragan},
  journal={ArXiv},
  year={2018},
  volume={abs/1802.01744}
}
In shared autonomy, user input is combined with semi-autonomous control to achieve a common goal. [] Key Method We use human-in-the-loop reinforcement learning with neural network function approximation to learn an end-to-end mapping from environmental observation and user input to agent action, with task reward as the only form of supervision. Controlled studies with users (n = 16) and synthetic pilots playing a video game and flying a real quadrotor demonstrate the ability of our algorithm to assist users…

Figures and Tables from this paper

Residual Policy Learning for Shared Autonomy
TLDR
A model-free, residual policy learning algorithm for shared autonomy that alleviates the need for restrictive assumptions about the goal space, environment dynamics, or human policy and significantly improves task performance without any knowledge of the human's goal beyond the constraints.
A Framework for Learning From Demonstration With Minimal Human Effort
TLDR
This work considers robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control, and learns to predict the success probability for each controller, given the initial state of an episode.
A Differentiable Policy for Shared Autonomy
TLDR
Initial results teleoperating a gripper in a virtual environment using pre-training and hand tuning of the arbitration function demonstrate the efficacy of the approach when the intent inference module is trained on a task similar to the one performed at test time.
Human-AI Shared Control via Frequency-based Policy Dissection
TLDR
The experiments show that human-AI shared control achieved by Policy Dissection in driving task can substantially improve the performance and safety in unseen traffic scenes and suggest the promising direction of implementing human- AI shared autonomy through interpreting the learned representation of the autonomous agents.
Real-World Human-Robot Collaborative Reinforcement Learning*
TLDR
This work presents a real-world setup of a human-robot collaborative maze game, designed to be non-trivial and only solvable through collaboration, by limiting the actions to rotations of two orthogonal axes, and assigning each axes to one player.
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems
TLDR
Results presented in this work show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms and that learning from a combination of human demonstrations and interventions is faster and more sample efficient when compared to traditional supervised learning algorithms.
Learn Task First or Learn Human Partner First? Deep Reinforcement Learning of Human-Robot Cooperation in Asymmetric Hierarchical Dynamic Task
TLDR
It is hypothesized that the robot needs to learn the task separately from learning the behavior of the human partner to improve learning efficiency and outcomes and a novel hierarchical rewards mechanism with a task decomposition method is developed.
Augmenting Human Control through Machine Learning
TLDR
This dissertation aims to develop algorithms that endow machines with the ability to collaborate with humans on challenging sequential decision-making problems in robotic control, by advancing the state of the art in machine learning, robotics, and cognitive science.
On Optimizing Interventions in Shared Autonomy
TLDR
This work proposes two model-free reinforcement learning methods that can account for both hard and soft constraints on the number of interventions by the autonomous agent and shows that not only does this method outperform the existing baseline, but also eliminates the need to manually tune a black-box hyperparameter for controlling the level of assistance.
Learning Arbitration for Shared Autonomy by Hindsight Data Aggregation
TLDR
A shared control policy is defined that allows to blend between direct user control and autonomous control based on user intent inference based onuser intent inference for the teleoperation of pick-and-place tasks.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Shared Autonomy via Hindsight Optimization
TLDR
The problem of shared autonomy is formulated as a Partially Observable Markov Decision Process with uncertainty over the user's goal, and maximum entropy inverse optimal control is utilized to estimate a distribution over the users' goal based on the history of inputs.
A policy-blending formalism for shared control
TLDR
This work proposes an intuitive formalism that captures assistance as policy blending, illustrates how some of the existing techniques for shared control instantiate it, and provides a principled analysis of its main components: prediction of user intent and its arbitration with the user input.
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
TLDR
An extension of the TAMER framework that leverages the representational power of deep neural networks in order to learn complex tasks in just a short amount of time with a human trainer, and demonstrates its success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling.
Reinforcement learning from simultaneous human and MDP reward
TLDR
A novel algorithm is introduced that shares the same spirit as tamer+rl but learns simultaneously from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process.
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Interactively shaping agents via human reinforcement: the TAMER framework
TLDR
Results from two domains demonstrate that lay users can train TAMER agents without defining an environmental reward function (as in an MDP) and indicate that human training within the TAMER framework can reduce sample complexity over autonomous learning algorithms.
Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds
We describe a method to use discrete human feedback to enhance the performance of deep learning agents in virtual three-dimensional environments by extending deep-reinforcement learning to model the
Reinforcement learning for robots using neural networks
TLDR
This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.
Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations
TLDR
It is observed in simulation that for linear SVMs, policies learned with RC outperformed those learned with HC but that using deep models this advantage disappears, and it is proved there exists a class of examples in which at the limit, HC is guaranteed to converge to an optimal policy while RC may fail to converge.
Deep Recurrent Q-Learning for Partially Observable MDPs
TLDR
The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens.
...
...