Corpus ID: 1301081

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

@article{Abel2017AgentAgnosticHR,
  title={Agent-Agnostic Human-in-the-Loop Reinforcement Learning},
  author={David Abel and John Salvatier and Andreas Stuhlm{\"u}ller and Owain Evans},
  journal={ArXiv},
  year={2017},
  volume={abs/1701.04079}
}
Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedures. In this work, we explore protocol programs, an agent-agnostic schema for Human-in-the-Loop… Expand
Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning
TLDR
An adaptive shaping algorithm is proposed which is capable of learning the most suitable shaping method in an on-line manner and its effectiveness is verified from both simulated and real human studies, shedding some light on the role and impact of human factors in human-robot collaborative learning. Expand
Leveraging Human Guidance for Deep Reinforcement Learning Tasks
TLDR
This survey provides a high-level overview of five recent learning frameworks that primarily rely on human guidance other than conventional, step-by-step action demonstrations and reviews the motivation, assumption, and implementation of each framework. Expand
Learning from Human Feedback: A Comparison of Interactive Reinforcement Learning Algorithms
Reinforcement Learning potentially provides a powerful tool for self improvement of future robots. However, in contrast to learning in simulation, on real robots it is much more important to beExpand
Recent advances in leveraging human guidance for sequential decision-making tasks
TLDR
This survey provides a high-level overview of five recent machine learning frameworks that primarily rely on human guidance apart from pre-specified reward functions or conventional, step-by-step action demonstrations. Expand
Multi-Channel Interactive Reinforcement Learning for Sequential Tasks
TLDR
The experimental evaluations show that the approach can successfully incorporate human input to accelerate the learning process in both robotic tasks even if it is partially wrong, and can be beneficial for the future design of algorithms and interfaces of interactive reinforcement learning systems used by inexperienced users. Expand
Scalable agent alignment via reward modeling: a research direction
TLDR
This work outlines a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning. Expand
A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems
TLDR
A taxonomy of solutions for the general knowledge reuse problem is defined, providing a comprehensive discussion of recent progress on knowledge reuse in Multiagent Systems (MAS) and of techniques for knowledge reuse across agents (that may be actuating in a shared environment or not). Expand
Safe Driving via Expert Guided Policy Optimization
  • Zhenghao Peng, Quanyi Li, Chunxiao Liu, Bolei Zhou
  • Computer Science
  • ArXiv
  • 2021
TLDR
A novel Expert Guided Policy Optimization method that integrates the guardian in the loop of reinforcement learning, which is composed of an expert policy to generate demonstration and a switch function to decide when to intervene. Expand
Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies
TLDR
An iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraintsatisfying set is proposed. Expand
Pitfalls of learning a reward function online
TLDR
This work considers a continual ( ``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time, and formally introduces two desirable properties: `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 58 REFERENCES
Principled Methods for Advising Reinforcement Learning Agents
TLDR
This paper presents a method for incorporating arbitrary advice into the reward structure of a reinforcement learning agent without altering the optimal policy, and develops two qualitatively different methods for converting a potential function into advice for the agent. Expand
Augmenting Reinforcement Learning with Human Feedback
As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users —Expand
Interactively shaping agents via human reinforcement: the TAMER framework
TLDR
Results from two domains demonstrate that lay users can train TAMER agents without defining an environmental reward function (as in an MDP) and indicate that human training within the TAMER framework can reduce sample complexity over autonomous learning algorithms. Expand
Creating Advice-Taking Reinforcement Learners
TLDR
This work presents and evaluates a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer, and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. Expand
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
TLDR
This paper introduces Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels and shows that it can outperform state-of-the-art approaches and is robust to infrequent and inconsistent human feedback. Expand
Teaching on a budget: agents advising agents in reinforcement learning
TLDR
It is shown that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations. Expand
Learning in Reinforcement Learning
Transfer learning in reinforcement learning is an area of research that seeks to speed up or improve learning of a complex target task, by leveraging knowledge from one or more source tasks. ThisExpand
Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance
TLDR
The importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance is demonstrated. Expand
Dynamic potential-based reward shaping
TLDR
This paper proves and demonstrates a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi- agent case. Expand
Safe reinforcement learning in high-risk tasks through policy improvement
TLDR
This paper defines the concept of risk and addresses the problem of safe exploration in the context of RL, and introduces an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. Expand
...
1
2
3
4
5
...