Eliciting Compatible Demonstrations for Multi-Human Imitation Learning

  title={Eliciting Compatible Demonstrations for Multi-Human Imitation Learning},
  author={Kanishk Gandhi and Siddharth Karamcheti and Madeline Liao and Dorsa Sadigh},
: Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation. While the ideal dataset for imitation learning is homogenous and low-variance – reflecting a single, optimal method for performing a task – natural human behavior has a great deal of heterogeneity , with several optimal ways to demonstrate a task. This multimodality is inconsequential to human users, with task variations manifesting as subcon-scious choices; for example… 

Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

Experiments suggest that the allocation of humans to robots significantly affects robot fleet performance, and that the proposed Fleet-DAgger algorithm achieves up to 8.8 × higher return on human effort than baselines.



What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

This study analyzes the most critical challenges when learning from offline human data for manipulation and highlights opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods.

Imitation Learning by Estimating Expertise of Demonstrators

This work develops and optimize a joint model over a learned policy and expertise levels of the demonstrators, which learns a single policy that can outperform even the best demonstrator, and can be used to estimate the expertise of any demonstrator at any state.

Negative Result for Learning from Demonstration: Challenges for End-Users Teaching Robots with Task And Motion Planning Abstractions

Two novel human-subjects experiments reveal the need for fundamentally different approaches in LfD which can allow end-users to teach generalizable long-horizon tasks to robots without the need to be coached by experts at every step.

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

A novel computational technique is developed that infers an idealized reward function from suboptimal demonstration and bootstrapsSuboptimal demonstrations to synthesize optimality-parameterized training data for training the authors' reward function.

BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

An interactive and exible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pretrained embeddings of natural language or videos of humans performing the task.

Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality

The approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of the authors' model and to learn the confidence.

Learning from Interventions: Human-robot interaction as both explicit and implicit feedback

It is argued that learning interactively from expert interventions enjoys the best of both worlds, and any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both.

Bottom-Up Skill Discovery From Unsegmented Demonstrations for Long-Horizon Robot Manipulation

This work presents a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations and uses these skills to synthesize prolonged robot behaviors to solve long-horizon manipulation tasks.

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data

The theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning, effectively reducing the need for large near-optimal expert datasets through the use of aux-iliary non-expert data.

Burn-In Demonstrations for Multi-Modal Imitation Learning

This work extends InfoGAIL, an algorithm for multi-modal imitation learning, to reproduce behavior over an extended period of time, and involves reformulating the typical imitation learning setting to include "burn-in demonstrations" upon which policies are conditioned at test time.