Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

@article{Seita2020LearningTR,
  title={Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks},
  author={Daniel Seita and Peter R. Florence and Jonathan Tompson and Erwin Coumans and Vikas Sindhwani and Ken Goldberg and Andy Zeng},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2020},
  pages={4568-4575}
}
Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and may involve complex relative spatial relations such as "place the item inside the bag". In this… 

Deep Reinforcement Learning Based on Local GNN for Goal-Conditioned Deformable Object Rearranging

This paper designs a local GNN (Graph Neural Network) based learning method, which utilizes two representation graphs to encode keypoints detected from images and can be easily transferred to a real robot by fine-tuning a keypoint detector.

Graph-Transporter: A Graph-based Learning Method for Goal-Conditioned Deformable Object Rearranging Task

A novel framework, Graph-Transporter, is presented, based on Fully Convolutional Network, to output pixel-wise pick-and-place actions from only visual input and is effective and general in handling goal-conditioned deformable object rearranging tasks.

Benchmarking Deformable Object Manipulation with Differentiable Physics

This work presents DaXBench, a differentiable DOM benchmark with a wide object and task coverage, and provides careful empirical studies of existing decision-making algorithms based on differentiable physics, and discusses their limitations, as well as potential future directions.

Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

A visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently and develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method.

CLIPort: What and Where Pathways for Robotic Manipulation

CLIPORT is presented, a language-conditioned imitation learning agent that combines the broad semantic understanding of CLIP with the spatial precision of Transporter and is capable of solving a variety of language-specified tabletop tasks without any explicit representations of object poses, instance, history, symbolic states, or syntactic structures.

Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Prediction for Robot Pose Prediction

A novel Key-Query-Value formulated visual attention model capable of switching attention targets by externally modifying the Query representations, namely top-down attention is proposed.

Graph-based Task-specific Prediction Models for Interactions between Deformable and Rigid Objects

This work contributes a simulation environment and generates a novel dataset for task-specific manipulation, involving interactions between rigid objects and a deformable bag, and combines modules with different prediction horizons into a mixed-horizon model which addresses long-term prediction.

Multi-Task Learning with Sequence-Conditioned Transporter Networks

This work proposes a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, and proposes a vision-based end-to-end system architecture that augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems.

Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention

Foldsformer can complete multi-step cloth manipulation tasks even when configurations of the cloth vary from configurations in the general demonstrations, and can be transferred from simulation to the real world without additional training or domain randomization.

MIRA: Mental Imagery for Robotic Affordances

Mental Imagery for Robotic Affordances (MIRA) is introduced, an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
...

References

SHOWING 1-10 OF 73 REFERENCES

Transporter Networks: Rearranging the Visual World for Robotic Manipulation

The Transporter Network is proposed, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions and learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses.

Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data

This work presents an approach that learns point-pair correspondences between initial and goal rope configurations, which implicitly encodes geometric structure, entirely in simulation from synthetic depth images, and demonstrates that the learned representation — dense depth object descriptors (DDODs) — can be used to manipulate a real rope into a variety of different arrangements.

Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience

This paper shows that it is possible to learn fabric folding skills in only an hour of self-supervised real robot experience, without human supervision or simulation, and creates an expressive goal-conditioned pick and place policy that can be trained efficiently with real world robot data only.

SoftGym: Benchmarking Deep Reinforcement Learning for Deformable Object Manipulation

SoftGym is presented, a set of open-source simulated benchmarks for manipulating deformable objects, with a standard OpenAI Gym API and a Python interface for creating new environments, to enable reproducible research in this important area.

Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly

This work proposes to formulate the kit assembly task as a shape matching problem, where the goal is to learn a shape descriptor that establishes geometric correspondences between object surfaces and their target placement locations from visual input.

Learning to Manipulate Deformable Objects without Demonstrations

This paper proposes an iterative pick-place action space that encodes the conditional relationship between picking and placing on deformable objects and obtains an order of magnitude faster learning compared to independent action-spaces on a suite of deformable object manipulation tasks with visual RGB observations.

VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation

The Visual Foresight framework is extended to learn fabric dynamics that can be efficiently reused to accomplish different fabric manipulation tasks with a single goal-conditioned policy, and it is found that leveraging depth significantly improves performance.

Learning Dense Visual Correspondences in Simulation to Smooth and Fold Real Fabrics

Learning point-pair correspondences across different fabric configurations in simulation makes it possible to define policies to robustly imitate a broad set of multi-step fabric smoothing and folding tasks, and suggests robustness to fabrics of various colors, sizes, and shapes.

Learning Robotic Manipulation through Visual Planning and Acting

This work learns to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object, and shows that separating the problem into visual planning and visual tracking control is more efficient and more interpretable than alternative data-driven approaches.

TossingBot: Learning to Throw Arbitrary Objects With Residual Physics

This work proposes an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (RGB-D images of arbitrary objects in a bin) through trial and error and generalizes to new objects and target locations.
...