MetaMorph: Learning Universal Controllers with Transformers

  title={MetaMorph: Learning Universal Controllers with Transformers},
  author={Agrim Gupta and Linxi (Jim) Fan and Surya Ganguli and Li Fei-Fei},
Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning. In contrast, in robotics we primarily train a single robot for a single task. However, modular robot systems now allow for the flexible combination of general-purpose building blocks into task optimized morphologies. However, given the exponentially large number of possible robot morphologies, training a… 

A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation

This work explores a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data and suggests large diverse offline datasets, unified IO representation, and policy representation and architecture selection through supervised learning form a promising approach for studying and advancing morphology-task generalization.

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

This work investigates P ER A CT, a language-conditioned behavior-cloning agent for multi-task 6-DoF manipulation, and shows that it significantly outperforms unstructured image-to-action agents and 3D ConvNet baselines for a wide range of tabletop tasks.

VIMA: General Robot Manipulation with Multimodal Prompts

This work designs a transformer-based generalist robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively and achieves strong scalability in both model capacity and data size.

Low-Rank Modular Reinforcement Learning via Muscle Synergy

This work proposes a Synergy-Oriented LeARning (S OLAR) framework that exploits the redundant nature of DoF in robot control and achieves a low-rank control at the synergy level on a variety of robot morphologies.

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

This work introduces M INE D OJO, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions and proposes a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.

Co-design of Embodied Neural Intelligence via Constrained Evolution

A novel co-design method for autonomous moving agents’ shape attributes and locomotion is introduced by combining deep reinforcement learning and evolution with user control and provides satisfactory results by training thousands of agents within one hour.

Minimal neural network models for permutation invariant agents

This work constructs a conceptually simple model that exhibit flexibility most ANNs lack, and demonstrates the model's properties on multiple control problems, and shows that it can cope with even very rapid permutations of input indices, as well as changes in input size.

N-LIMB: Neural Limb Optimization for Efficient Morphological Design

N-L IMB is presented, an efficient approach to optimizing the design and control of a robot over large sets of morphologies and central to this framework is a universal, design-conditioned control policy capable of controlling a diverse sets of designs.

Adapting Neural Models with Sequential Monte Carlo Dropout

Experimental results show improved performance in control problems requiring both online and look-ahead prediction, and showcase the interpretability of the inferred masks in a human behaviour modelling task for drone tele-operation.

RT-1: Robotics Transformer for Real-World Control at Scale

This paper presents a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties and verify the conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.



Hardware Conditioned Policies for Multi-Robot Transfer Learning

This work uses the kinematic structure directly as the hardware encoding and shows great zero-shot transfer to completely novel robots not seen during training and demonstrates that fine-tuning the policy network is significantly more sample-efficient than training a model from scratch.

Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms

The proposed method can successfully adapt a trained policy to different robotic platforms with novel physical parameters and the superiority of the meta-learning algorithm compared to state-of-the-art methods for the introduced few-shot policy adaptation problem is demonstrated.

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

It is shown that a single modular policy can successfully generate locomotion behaviors for several planar agents with different skeletal structures such as monopod hoppers, quadrupeds, bipeds, and generalize to variants not seen during training -- a process that would normally require training and manual hyperparameter tuning for each morphology.

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation

This study analyzes the most critical challenges when learning from offline human data for manipulation and highlights opportunities for learning from human datasets, such as the ability to learn proficient policies on challenging, multi-stage tasks beyond the scope of current reinforcement learning methods.

Task-Agnostic Morphology Evolution

Without any task or reward specification, TAME evolves morphologies by only applying randomly sampled action primitives on a population of agents using an information-theoretic objective that efficiently ranks agents by their ability to reach diverse states in the environment and the causality of their actions.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

This paper investigates a modular co-evolution strategy: a collection of primitive agents learns to dynamically self-assemble into composite bodies while also learning to coordinate their behavior to control these bodies.

Relational inductive biases, deep learning, and graph networks

It is argued that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective.

NerveNet: Learning Structured Policy with Graph Neural Networks

NerveNet is proposed to explicitly model the structure of an agent, which naturally takes the form of a graph, and is demonstrated to be significantly more transferable and generalizable than policies learned by other models and are able to transfer even in a zero-shot setting.

Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing

SNOWFLAKE is introduced, a GNN training method for high-dimensional continuous control that freezes parameters in parts of the network that suffer from overfitting, and significantly boosts the performance of GNNs for locomotion control on large agents, now matching theperformance of MLPs, and with superior transfer properties.