• Corpus ID: 203610447

Cross Domain Imitation Learning

  title={Cross Domain Imitation Learning},
  author={Kuno Kim and Yihong Gu and Jiaming Song and Shengjia Zhao and Stefano Ermon},
We study the question of how to imitate tasks across domains with discrepancies such as embodiment and viewpoint mismatch. Many prior works require paired, aligned demonstrations and an additional RL procedure for the task. However, paired, aligned demonstrations are seldom obtainable and RL procedures are expensive. In this work, we formalize the Cross Domain Imitation Learning (CDIL) problem, which encompasses imitation learning in the presence of viewpoint and embodiment mismatch. Informally… 

Figures and Tables from this paper

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

This paper introduces a new algorithm, Disentangling Generative Adversarial Imitation Learning (DisentanGAIL), which enables autonomous agents to learn directly from high dimensional observations of an expert performing a task, by making use of adversarial learning with a latent representation inside the discriminator network.

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

This paper proposes a framework that align dynamic robot behavior across two domains using a cycle-consistency constraint and can directly transfer the policy trained on one domain to the other, without needing any additional fine-tuning on the second domain.

Learning from Imperfect Demonstrations via Adversarial Confidence Transfer

This work relies on demonstrations along with their confidence values from a different correspondent environment to learn a confidence predictor for the environment the authors aim to learnA policy in (target environment-where they only have unlabeled demonstrations), which reweights the demonstrations to enable learning more from informative demonstrations and discarding the irrelevant ones.

Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning

The Importance Weighting with REjection (IWRE) algorithm based on importance-weighting and learning with rejection to solve HOIL problems is proposed and results show that IWRE can successfully solve various HOIL tasks, including the challenging tasks of transforming the vision-based demonstrations to random access memory (RAM)-based policies in the Atari domain, even with limited visual observations.

Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations

The Importance Weighting with REjection (IWRE) algorithm based on importance weighting and learning with rejection to solve HOIL problems is proposed and results show that IWRE can solve various HOIL tasks, including the challenging tasks of transforming the vision-based demonstrations to random access memory (RAM)-based policies in the Atari domain, even with limited visual observations.

HILONet: Hierarchical Imitation Learning from Non-Aligned Observations

A new imitation learning approach called Hierarchical Imitation Learning from Observation (HILONet), which adopts a hierarchical structure to choose feasible sub-goals from demonstrated observations dynamically, which can solve all kinds of tasks by achieving these sub-Goals, whether it has a single goal position or not.

Provably Efficient Third-Person Imitation from Offline Observation

This work presents problem-dependent, statistical learning guarantees for third-person imitation from observation in an offline setting, and a lower bound on performance in the online setting.



Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

This work proposes an imitation learning method based on video prediction with context translation and deep reinforcement learning that enables a variety of interesting applications, including learning robotic skills that involve tool use simply by observing videos of human tool use.

Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment

An autonomous framework is introduced that uses unsupervised manifold alignment to learn inter-task mappings and effectively transfer samples between different task domains and demonstrates its effectiveness for cross-domain transfer in the context of policy gradient RL.

Time-Contrastive Networks: Self-Supervised Learning from Video

A self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints is proposed, and it is demonstrated that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be use as a reward function within a reinforcement learning algorithm.

End-to-End Training of Deep Visuomotor Policies

This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method.

Third-Person Imitation Learning

The methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process.

Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

This paper introduces a problem formulation where two agents are tasked with learning multiple skills by sharing information and uses the skills that were learned by both agents to train invariant feature spaces that can be used to transfer other skills from one agent to another.

Generative Adversarial Imitation Learning

A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.

An automated measure of MDP similarity for transfer in reinforcement learning

A data-driven automated similarity measure for Markov Decision Processes, based on the reconstruction error of a restricted Boltzmann machine that attempts to model the behavioral dynamics of the two MDPs being compared, which can be used to identify similar source tasks for transfer learning.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

Cross-Domain Transfer in Reinforcement Learning Using Target Apprentice

It is shown the optimal policy from a related source task can be near optimal in target domain provided an adaptive policy accounts for the model error between target and the projected source.