• Corpus ID: 238253379

Learning Reward Functions from Scale Feedback

  title={Learning Reward Functions from Scale Feedback},
  author={Nils Wilde and Erdem Biyik and Dorsa Sadigh and Stephen L. Smith},
Today’s robots are increasingly interacting with people and need to efficiently learn inexperienced user’s preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We introduce a probabilistic model on how… 

Figures and Tables from this paper

APReL: A Library for Active Preference-based Reward Learning Algorithms

APReL is presented, a library for active preference-based reward learning algorithms, which enable researchers and practitioners to experiment with the existing techniques and easily develop their own algorithms for various modules of the problem.

Learning from Humans for Adaptive Interaction

  • Erdem Biyik
  • Computer Science
    2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)
  • 2022
The goal of this research is to equip robots with the capability of using multiple modes of information sources using a Bayesian learning approach, and to show how this approach is useful in a variety of applications ranging from exoskeleton gait optimization to traffic routing.

Error-Bounded Approximation of Pareto Fronts in Robot Planning Problems

This work addresses the problem of computing a set of weight vectors such that for any other weight vector, there exists an element in the set whose error compared to optimal is minimized, and proves fundamental properties of the optimal cost as a function of the weight vectors, including its continuity and concavity.



Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

This paper explores an information gain formulation for optimally selecting questions that naturally account for the human's ability to answer, and determines when these questions become redundant or costly.

Active Comparison Based Learning Incorporating User Uncertainty and Noise

This work presents CLAUS (Comparison Learning Algorithm for Uncertain Situations), which model uncertainty and uses it to select and process comparison queries, and suggests that CLAUS uses fewer queries than algorithms which force users to choose, while maintaining nearly the same accuracy.

Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries

It is proposed that there is much richer information that users can easily provide and that robots ought to leverage, and that richer, feature-augmented queries can extract more information faster, leading to robots that better match user preferences in their behavior.

Learning Reward Functions by Integrating Human Demonstrations and Preferences

This work proposes a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function and finds that it is significantly more efficient than a standard active preference-based learning method.

Active Preference Learning using Maximum Regret

This work proposes a query selection that greedily reduces the maximum error ratio over the solution space and demonstrates that the proposed approach outperforms other state of the art techniques in both learning efficiency and ease of queries for the user.

Active Preference-Based Learning of Reward Functions

This work builds on work in label ranking and proposes to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories, and takes an active learning approach, in which the system decides on what preference queries to make.

Learning from Extrapolated Corrections

This work casts this extrapolation problem as online function approximation, which exposes different ways in which the robot can interpret what trajectory the person intended, depending on the function space used for the approximation.

Including Uncertainty when Learning from Human Corrections

This work argues that while the robot should estimate the most likely human preferences, it should also know what it does not know, and integrate this uncertainty when making decisions, and indicates that maintaining and leveraging uncertainty leads to faster learning from human corrections.

Active Reward Learning from Critiques

  • Yuchen CuiS. Niekum
  • Computer Science
    2018 IEEE International Conference on Robotics and Automation (ICRA)
  • 2018
This work proposes a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that queries the user for critiques of automatically generated trajectories, utilizes trajectory segmentation to expedite the critique / labeling process, and predicts the user's critiques to generate the most highly informative trajectory queries.

Active Preference-Based Gaussian Process Regression for Reward Learning

This work model the reward function using a Gaussian Process and proposes a mathematical formulation to actively find a GP using only human preferences, enabling it to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework.