Corpus ID: 235368061

XIRL: Cross-embodiment Inverse Reinforcement Learning

@article{Zakka2021XIRLCI,
  title={XIRL: Cross-embodiment Inverse Reinforcement Learning},
  author={Kevin Zakka and Andy Zeng and Peter R. Florence and Jonathan Tompson and Jeannette Bohg and Debidatta Dwibedi},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.03911}
}
We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments – shape, actions, end-effector dynamics, etc. In this work, we demonstrate that it is possible to automatically discover and learn vision-based reward functions from cross-embodiment demonstration videos that are robust to these differences. Specifically, we present a self-supervised… Expand
Cross-Domain Imitation Learning via Optimal Transport
TLDR
The theory formally characterizes the scenarios where GWIL preserves optimality, revealing its possibilities and limitations and demonstrates the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation ofThe state-action space. Expand

References

SHOWING 1-10 OF 69 REFERENCES
Learning Actionable Representations from Visual Observations
TLDR
This work shows that the representations learned by agents observing themselves take random actions, or other agents perform tasks successfully, can enable the learning of continuous control policies using algorithms like Proximal Policy Optimization using only the learned embeddings as input. Expand
Time-Contrastive Networks: Self-Supervised Learning from Video
TLDR
A self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints is proposed, and it is demonstrated that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be use as a reward function within a reinforcement learning algorithm. Expand
Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video
TLDR
This work proposes a novel approach to learn a task-agnostic skill embedding space from unlabeled multi-view videos by using an adversarial loss, and shows that the learned embedding enables training of continuous control policies to solve novel tasks that require the interpolation of previously seen skills. Expand
Visual Imitation Learning with Recurrent Siamese Networks
TLDR
This work addresses a particularly challenging form of this problem where only a single demonstration is provided for a given task -- the one-shot learning setting with Siamese networks, trained to compute distances between observed behaviours and the agent's behaviours. Expand
Unsupervised Perceptual Rewards for Imitation Learning
TLDR
This work presents a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps. Expand
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
TLDR
This work proposes an imitation learning method based on video prediction with context translation and deep reinforcement learning that enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use. Expand
Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller
TLDR
A hierarchical setup where a high-level module learns to generate a series of first-person sub-goals conditioned on the third-person video demonstration, and a low-level controller predicts the actions to achieve those sub-Goals is proposed. Expand
Third-Person Imitation Learning
TLDR
The methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process. Expand
Concept2Robot: Learning Manipulation Concepts from Instructions and Human Demonstrations
TLDR
This work proposes a two-stage learning process where a single multi-task policy is learned that takes as input a natural language instruction and an image of the initial scene and outputs a robot motion trajectory to achieve the specified task. Expand
Behavioral Cloning from Observation
TLDR
This work proposes a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that allows the agent to acquire experience in a self-supervised fashion to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. Expand
...
1
2
3
4
5
...