• Corpus ID: 246823748

Factored World Models for Zero-Shot Generalization in Robotic Manipulation

@article{Biza2022FactoredWM,
  title={Factored World Models for Zero-Shot Generalization in Robotic Manipulation},
  author={Ondrej Biza and Thomas Kipf and David Klee and Robert W. Platt and J.-W. van de Meent and Lawson L. S. Wong},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.05333}
}
World models for environments with many objects face a combinatorial explosion of states: as the number of objects increases, the number of possible arrangements grows exponentially. In this paper, we learn to generalize over robotic pick-and-place tasks using object-factored world models, which combat the combinatorial explosion by ensuring that predictions are equivariant to permutations of objects. Previous objectfactored models were limited either by their inability to model actions, or by… 

Planning for Multi-Object Manipulation with Graph Neural Network Relational Classifiers

A novel graph neural network framework for multi-object manipulation to predict how inter-object relations change given robot actions, which enables multi-step planning to reach target goal relations and shows the model trained purely in simulation transfers well to the real world.

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

This paper proposes to use structured world models to incorporate relational inductive biases in the control loop to achieve sample-efficient and interaction-rich exploration in compositional multi-object environments and showcases that the self-reinforcing cycle between good models and good exploration also opens up another avenue: zero-shot generalization to downstream tasks via model-based planning.

Binding Actions to Objects in World Models

Two attention mechanisms for binding actions to objects, soft attention and hard attention, are proposed and shown to help contrastively-trained structured world models to learn to separate individual objects in an object-based grid-world environment.

B INDING A CTIONS TO O BJECTS IN W ORLD M ODELS

Two attention mechanisms for binding actions to objects, soft attention and hard attention, are proposed and it is shown that hard attention helps contrastively-trained structured world models to learn to separate individual objects in an object-based grid-world environment.

References

SHOWING 1-10 OF 42 REFERENCES

Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning

It is shown that graph-based relational architectures overcome this limitation and enable learning of complex tasks when provided with a simple curriculum of tasks with increasing numbers of objects, and exhibits zero-shot generalization.

Object-centric Forward Modeling for Model Predictive Control

An approach to learn an object-centric forward model that can be leveraged to search for action sequences that lead to desired goal configurations, and that in conjunction with a learned correction module, this allows for robust closed loop execution.

Reasoning About Physical Interactions with Object-Oriented Prediction and Planning

This work presents a paradigm for learning object-centric representations for physical scene understanding without direct supervision of object properties, and can use its learned representations to build block towers more complicated than those observed during training.

Action Priors for Large Action Spaces in Robotics

This paper proposes an alternative approach where the solutions of previously solved tasks are used to produce an action prior that can facilitate exploration in future tasks.

Visual Reinforcement Learning with Imagined Goals

An algorithm is proposed that acquires general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies, efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.

Structured Object-Aware Physics Prediction for Video Modeling and Planning

STOVE is presented, a novel state-space model for videos, which explicitly reasons about objects and their positions, velocities, and interactions, and outperforms previous unsupervised models, and even approaches the performance of supervised baselines.

Graph networks as learnable physics engines for inference and control

A new class of learnable models are introduced--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems, and offers new opportunities for harnessing and exploiting rich knowledge about the world.

Interaction Networks for Learning about Objects, Relations and Physics

The interaction network is introduced, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system, and is implemented using deep neural networks.

An algebraic approach to abstraction in reinforcement learning

This work introduces relativized options, a generalization of Markov sub-goal options, that allow us to define options without an absolute frame of reference and introduces an extension to the options framework that allows us to learn simultaneously at multiple levels of the hierarchy guarantees regarding the performance of hierarchical systems that employ approximate in several test-beds.

Contrastive Learning of Structured World Models

These experiments demonstrate that C-SWMs can overcome limitations of models based on pixel reconstruction and outperform typical representatives of this model class in highly structured environments, while learning interpretable object-based representations.