Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data

  title={Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data},
  author={Priya Sundaresan and Jennifer Grannen and Brijen Thananjeyan and Ashwin Balakrishna and Michael Laskey and Kevin Stone and Joseph Gonzalez and Ken Goldberg},
  journal={2020 IEEE International Conference on Robotics and Automation (ICRA)},
Robotic manipulation of deformable 1D objects such as ropes, cables, and hoses is challenging due to the lack of high-fidelity analytic models and large configuration spaces. Furthermore, learning end-to-end manipulation policies directly from images and physical interaction requires significant time on a robot and can fail to generalize across tasks. We address these challenges using interpretable deep visual representations for rope, extending recent work on dense object descriptors for robot… 

Supervised Training of Dense Object Nets using Optimal Descriptors for Industrial Robotic Applications

This paper shows that given a 3D model of an object, it can generate its descriptor space image, which allows for supervised training of dense object descriptors, and relies on Laplacian Eigenmaps (LE) to embed the 3D models of an objects into an optimally generated space.

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation

We present Neural Descriptor Fields (NDFs), an object representation that encodes both points and relative poses between an object and a target (such as a robot gripper or a rack used for hanging)

Learning Dense Visual Descriptors using Image Augmentations for Robot Manipulation Tasks

This work proposes a self-supervised training approach for learning view-invariant dense visual descriptors using image augmentations and shows that training on synthetic correspondences provides descriptor consistency across a broad range of camera views.

Learning to Smooth and Fold Real Fabric Using Dense Object Descriptors Trained on Synthetic Color Images

This paper learns visual representations of deformable fabric by training dense object descriptors that capture correspondences across images of fabric in various configurations that facilitates multistep fabric smoothing and folding tasks on real physical systems.

Transporter Networks: Rearranging the Visual World for Robotic Manipulation

The Transporter Network is proposed, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions and learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses.

Learning Dense Visual Correspondences in Simulation to Smooth and Fold Real Fabrics

Learning point-pair correspondences across different fabric configurations in simulation makes it possible to define policies to robustly imitate a broad set of multi-step fabric smoothing and folding tasks, and suggests robustness to fabrics of various colors, sizes, and shapes.

ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation

ACID, an action-conditional visual dynamics model for volumetric deformable objects based on structured implicit neural representations, achieves the best performance in geometry, correspondence, and dynamics predictions over existing approaches.

Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

This work proposes embedding goal- conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation that rearranges deep features to infer displacements that can represent pick and place actions, and demonstrates that goal-conditioned Transporter networks enable agents to manipulate deformable structures into flexibly specified configurations without test-time visual anchors for target locations.

Unsupervised Learning of Visual 3D Keypoints for Control

This work proposes a framework to learn such a 3D geometric structure directly from images in an end-toend unsupervised manner and outperforms prior state-of-art methods across a variety of reinforcement learning benchmarks.

Learning Robot Policies for Untangling Dense Knots in Linear Deformable Structures

HULK is able to successfully untangle an LDO from a dense initial configuration containing only up to two overhand and figure-eight knots in 97.9% of 378 simulated experiments with an average of 12.1 actions per trial, suggesting that the policy can learn the task of untangling effectively from an algorithmic supervisor.



Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation

Dense Object Nets are presented, which build on recent developments in self-supervised dense descriptor learning, as a consistent object representation for visual understanding and manipulation and are demonstrated they can be trained quickly for a wide variety of previously unseen and potentially non-rigid objects.

Learning Robotic Manipulation through Visual Planning and Acting

This work learns to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object, and shows that separating the problem into visual planning and visual tracking control is more efficient and more interpretable than alternative data-driven approaches.

Combining self-supervised learning and imitation for vision-based rope manipulation

It is shown that by combining the high and low-level plans, the robot can successfully manipulate a rope into a variety of target shapes using only a sequence of human-provided images for direction.

Self-Supervised Visual Descriptor Learning for Dense Correspondence

A new approach to learning visual descriptors for dense correspondence estimation is advocated in which the power of a strong three-dimensional generative model is harnessed to automatically label correspondences in RGB-D video data.

Grasp2Vec: Learning Object Representations from Self-Supervised Grasping

This paper studies how to acquire effective object-centric representations for robotic manipulation tasks without human labeling by using autonomous robot interaction with the environment using self-supervised methods.

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry.

Task-oriented grasping with semantic and geometric scene understanding

A key element of this work is to use a deep network to integrate contextual task cues, and defer the structured-output problem of gripper pose computation to an explicit (learned) geometric model.

Object recognition and full pose registration from a single image for robotic manipulation

This paper presents an approach for building metric 3D models of objects using local descriptors from several images, optimized to fit a set of calibrated training images, thus obtaining the best possible alignment between the 3D model and the real object.

Interactive computational imaging for deformable object analysis

It is shown that even stiff, fragile, or low-texture objects can be distinguished based on their mechanical behaviours by applying a periodic stimulus and matched video filtering and analysis pipeline, and that objects are linearly distinguishable under this approach.

Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards

The Dexterity Network (Dex-Net) 1.0, a dataset of 3D object models and a sampling-based planning algorithm to explore how Cloud Robotics can be used for robust grasp planning, and reports on system sensitivity to variations in similarity metrics and in uncertainty in pose and friction.