Towards Transferring Human Preferences from Canonical to Actual Assembly Tasks

  title={Towards Transferring Human Preferences from Canonical to Actual Assembly Tasks},
  author={Heramb Nemlekar and Runyu Guan and Guanyang Luo and Satyandra K. Gupta and Stefanos Nikolaidis},
  journal={2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)},
  • Heramb NemlekarRunyu Guan S. Nikolaidis
  • Published 11 November 2021
  • Computer Science, Psychology
  • 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
To assist human users according to their individual preference in assembly tasks, robots typically require user demonstrations in the given task. However, providing demonstrations in actual assembly tasks can be tedious and time-consuming. Our thesis is that we can learn the preference of users in actual assembly tasks from their demonstrations in a representative canonical task. Inspired by prior work in economy of human movement, we propose to represent user preferences as a linear reward… 

Figures and Tables from this paper

Transfer Learning of Human Preferences for Proactive Robot Assistance in Assembly Tasks

This work proposes learning human preferences from demonstrations in a shorter, canonical task to predict user actions in the actual assembly task and shows that both transferring the preference model from the canonical task, as well as updating the model online, contribute to improved accuracy in human action prediction.



Maximum Entropy Inverse Reinforcement Learning

A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

Learning From Demonstrations Using Signal Temporal Logic in Stochastic and Continuous Domains

This letter proposes significant extensions that tackle stochastic environments and continuous state spaces and assumes that mission-level specifications for the robotic system are expressed in a suitable temporal logic such as Signal Temporal Logic (STL).

Two-Stage Clustering of Human Preferences for Action Prediction in Assembly Tasks

This work proposes a two-stage approach for learning and inferring the preferences of human operators based on the sequence of sub-tasks and actions and shows that it improves the prediction of human actions through cross-validation and applicability in a real-world robot-assisted IKEA assembly.

Learning from Demonstrations using Signal Temporal Logic

Signal Temporal Logic is used to evaluate and rank the quality of demonstrations, and it is shown that this approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.

Transfer Reinforcement Learning Across Homotopy Classes

A novel fine-tuning algorithm is proposed that consists of a relaxing stage and a curriculum learning stage to enable transfer learning across homotopy classes and proves that the Ease-In-Ease-Out fine- Tuning method can successfully fine-tune in a sample-efficient way compared to existing baselines.

Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

This work presents an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then proactively probes the user with preference queries to zero-in on their true reward, enabling a framework to integrate multiple sources of information, which are either passively or actively collected from human users.

Recent Advances in Robot Learning from Demonstration

In the context of robotics and automation, learning from demonstration (LfD) is the paradigm in which robots acquire new skills by learning to imitate an expert. The choice of LfD over other robot ...

A Comprehensive Survey on Transfer Learning

This survey attempts to connect and systematize the existing transfer learning research studies, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas.

Decision making in slow and rapid reaching: Sacrificing success to minimize effort

A strong tendency for humans to minimize immediate movement effort and a general failure to adapt movement strategies flexibly with changes in the task parameters is suggested.

Learning Reward Functions by Integrating Human Demonstrations and Preferences

This work proposes a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function and finds that it is significantly more efficient than a standard active preference-based learning method.