Corpus ID: 235731792

Here's What I've Learned: Asking Questions that Reveal Reward Learning

@article{Habibian2021HeresWI,
  title={Here's What I've Learned: Asking Questions that Reveal Reward Learning},
  author={Soheil Habibian and Ananth Jonnavittula and Dylan P. Losey},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.01995}
}
Robots can learn from humans by asking questions. In these questions the robot demonstrates a few different behaviors and asks the human for their favorite. But how should robots choose which questions to ask? Today’s robots optimize for informative questions that actively probe the human’s preferences as efficiently as possible. But while informative questions make sense from the robot’s perspective, human onlookers often find them arbitrary and misleading. For example, consider an assistive… Expand

Figures and Tables from this paper

Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences
TLDR
This work presents an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then proactively probes the user with preference queries to zero-in on their true reward, enabling a framework to integrate multiple sources of information, which are either passively or actively collected from human users. Expand
APReL: A Library for Active Preference-based Reward Learning Algorithms
TLDR
APReL is presented, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop their own algorithms for various modules of the problem. Expand
Joint Communication and Motion Planning for Cobots
TLDR
A joint communication and motion planning framework that selects from an arbitrary input set of robot’s communication signals while computing robot motion plans and facilitates the specification of a variety of social/workplace compliance priorities with a flexible cost function is presented. Expand
Evaluation of Two Complementary Modeling Approaches for Fiber-Reinforced Soft Actuators
TLDR
This paper develops and test both a dynamic lumped-parameter model and a finite element model in an attempt to understand the practicability of Fiber Reinforced Elastomeric Enclosures for use in a soft robotic arm and proposes that designers can leverage multiple models to fill the gaps in the understanding of soft robots. Expand

References

SHOWING 1-10 OF 52 REFERENCES
Asking Easy Questions: A User-Friendly Approach to Active Reward Learning
TLDR
This paper explores an information gain formulation for optimally selecting questions that naturally account for the human's ability to answer, and determines when these questions become redundant or costly. Expand
Designing robot learners that ask good questions
  • M. Cakmak, A. Thomaz
  • Computer Science
  • 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI)
  • 2012
TLDR
This paper identifies three types of questions (label, demonstration and feature queries) and discusses how a robot can use these while learning new skills and provides guidelines for designing question asking behaviors on a robot learner. Expand
When Humans Aren’t Optimal: Robots that Collaborate with Risk-Aware Humans
TLDR
Overall, this paper extends existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during HRI and finds that this increased modeling accuracy results in safer and more efficient human-robot collaboration. Expand
Enabling Robots to Communicate Their Objectives
TLDR
It is shown that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations, and suggest, however, that additional research is needed in modeling how humans extrapolate from examples of robot behavior. Expand
Active Preference-Based Learning of Reward Functions
TLDR
This work builds on work in label ranking and proposes to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories, and takes an active learning approach, in which the system decides on what preference queries to make. Expand
Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences
TLDR
This work presents an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then proactively probes the user with preference queries to zero-in on their true reward, enabling a framework to integrate multiple sources of information, which are either passively or actively collected from human users. Expand
Robots that Take Advantage of Human Trust
  • Dylan P. Losey, D. Sadigh
  • Computer Science
  • 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2019
TLDR
It is shown that trusting human models can naturally lead to communicative robot behavior, which influences end-users and increases their involvement, in an offline linear-quadratic case study and a real-time user study. Expand
Learning preferences for manipulation tasks from online coactive feedback
TLDR
This work proposes a coactive online learning framework for teaching preferences in contextually rich environments, and implements its algorithm on two high-degree-of-freedom robots, PR2 and Baxter, and presents three intuitive mechanisms for providing incremental feedback. Expand
Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries
TLDR
It is proposed that there is much richer information that users can easily provide and that robots ought to leverage, and that richer, feature-augmented queries can extract more information faster, leading to robots that better match user preferences in their behavior. Expand
Learning from Physical Human Corrections, One Feature at a Time
TLDR
The approach allows the human-robot team to focus on learning one feature at a time, unlike state-of-the-art techniques that update all features at once, and suggests that users teaching one-at-a-time perform better, especially in tasks that require changing multiple features. Expand
...
1
2
3
4
5
...