• Corpus ID: 226299546

Transformers for One-Shot Visual Imitation

@inproceedings{Dasari2020TransformersFO,
  title={Transformers for One-Shot Visual Imitation},
  author={Sudeep Dasari and Abhinav Kumar Gupta},
  booktitle={CoRL},
  year={2020}
}
Humans are able to seamlessly visually imitate others, by inferring their intentions and using past experience to achieve the same end goal. In other words, we can parse complex semantic knowledge from raw video and efficiently translate that into concrete motor control. Is it possible to give a robot this same capability? Prior research in robot imitation learning has created agents which can acquire diverse skills from expert human operators. However, expanding these techniques to work with a… 

Figures and Tables from this paper

Manipulator-Independent Representations for Visual Imitation

TLDR
A way to train manipulator-independent representations (MIR) that primarily focus on the change in the environment and have all the characteristics that make them suitable for crossembodiment visual imitation with RL: cross-domain alignment, temporal smoothness, and being actionable is presented.

Towards More Generalizable One-shot Visual Imitation Learning

TLDR
MOSAIC (Multi-task One-Shot Imitation with self-Attention and Contrastive learning), which integrates a self-attention model architecture and a temporal contrastive module to enable better task disambiguation and more robust representation learning is proposed.

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

TLDR
P ER A CT, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation, outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.

Meta-Imitation Learning by Watching Video Demonstrations

TLDR
This work presents an approach of meta-imitation learning by watching video demonstrations from humans that is able to translate human videos into practical robot demonstrations and train the meta-policy with adaptive loss based on the quality of the translated data.

What Matters in Language Conditioned Robotic Imitation Learning Over Unstructured Data

TLDR
An extensive study of the most critical challenges in learning language conditioned policies from offline free-form imitation datasets is conducted and a novel approach is presented that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark.

BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

TLDR
An interactive and exible imitation learning system that can learn from both demonstrations and interventions and can be conditioned on different forms of information that convey the task, including pretrained embeddings of natural language or videos of humans performing the task.

Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation

TLDR
This work proposes a novel approach to learning few-shotimitation agents that is called demonstrationconditioned reinforcement learning (DCRL), and shows that DCRL outperforms methods based on behaviour cloning, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.

Behavior Transformers: Cloning k modes with one stone

TLDR
Behavior Transformer is presented, a new technique to model unlabeled demonstration data with multiple modes and improves over prior state-of-the-art work on solving demonstrated tasks while capturing the major modes present in the pre-collected datasets.

Back to Reality for Imitation Learning

TLDR
This paper proposes that the most appropriate evaluation metric for robot learning is not data efficiency, but time efficiency, which captures the real-world cost much more truthfully.

Robotic Grasping from Classical to Modern: A Survey

TLDR
This paper surveys the advances of robotic grasping, starting from the classical formulations and solutions to the modern ones, and discusses the open problems and the future research directions that may be important for the human-level robustness, autonomy, and intelligence of robots.

References

SHOWING 1-10 OF 57 REFERENCES

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

TLDR
This work presents an approach for one-shot learning from a video of a human by using human and robot demonstration data from a variety of previous tasks to build up prior knowledge through meta-learning, then combining this prior knowledge and only a single video demonstration from a human, the robot can perform the task that the human demonstrated.

One-Shot Visual Imitation Learning via Meta-Learning

TLDR
A meta-imitation learning method that enables a robot to learn how to learn more efficiently, allowing it to acquire new skills from just a single demonstration, and requires data from significantly fewer prior tasks for effective learning of new skills.

Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller

TLDR
A hierarchical setup where a high-level module learns to generate a series of first-person sub-goals conditioned on the third-person video demonstration, and a low-level controller predicts the actions to achieve those sub-Goals is proposed.

Grounding Language in Play

TLDR
A simple and scalable way to condition policies on human language instead of language pairing is presented, and a simple technique that transfers knowledge from large unlabeled text corpora to robotic learning is introduced that significantly improves downstream robotic manipulation.

One-Shot Imitation Learning

TLDR
A meta-learning framework for achieving one-shot imitation learning, where ideally, robots should be able to learn from very few demonstrations of any given task, and instantly generalize to new situations of the same task, without requiring task-specific engineering.

Time-Contrastive Networks: Self-Supervised Learning from Video

TLDR
A self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints is proposed, and it is demonstrated that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be use as a reward function within a reinforcement learning algorithm.

Concept2Robot: Learning manipulation concepts from instructions and human demonstrations

TLDR
This work aims to endow a robot with the ability to learn manipulation concepts that link natural language instructions to motor skills by proposing a two-stage learning process where the robot first learns single-task policies through reinforcement learning and a multi-task policy through imitation learning.

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

TLDR
This paper takes an automated approach and performs pixel-level image translation via CycleGAN to convert the human demonstration into a video of a robot, which can then be used to construct a reward function for a model-based RL algorithm.

Unsupervised Perceptual Rewards for Imitation Learning

TLDR
This work presents a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps.

Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight

TLDR
This work training a model with both a visual and physical understanding of multi-object interactions, and develops a sampling-based optimizer that can leverage these interactions to accomplish tasks, shows that the robot can perceive and use novel objects as tools, including objects that are not conventional tools, while also choosing dynamically to use or not use tools depending on whether or not they are required.
...