• Corpus ID: 239998474

Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects

  title={Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects},
  author={Jeffrey Ichnowski and Yahav Avigal and Justin Kerr and Ken Goldberg},
  booktitle={Conference on Robot Learning},
The ability to grasp and manipulate transparent objects is a major challenge for robots. Existing depth cameras have difficulty detecting, localizing, and inferring the geometry of such objects. We propose using neural radiance fields (NeRF) to detect, localize, and infer the geometry of transparent objects with sufficient accuracy to find and grasp them securely. We leverage NeRF’s viewindependent learned density, place lights to increase specular reflections, and perform a transparency-aware… 

Figures and Tables from this paper

GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF

This work proposes a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance (NeRF) to achieve material-agnostic object grasping in clutter and demonstrates that it outperforms all the baselines in all the experiments while remaining in real-time.

TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and A Grasping Baseline

This work contributes a large-scale real-world dataset for transparent object depth completion, which contains 57,715 RGB-D images from 130 different scenes and proposes an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map.

A4T: Hierarchical Affordance Detection for Transparent Objects Depth Reconstruction and Manipulation

Extensive experiments show that the proposed methods can predict accurate affordance maps, and significantly improve the depth reconstruction of transparent objects compared to the state-of-the-art method, with the Root Mean Squared Error in meters significantly decreased.

Neural Fields for Robotic Object Manipulation from a Single Image

This work believes this to be the first work to retrieve grasping poses directly from a NeRF-based representation using a single viewpoint (RGB-only), rather than going through a secondary network and/or representation.

NeRF2Real: Sim2real Transfer of Vision-guided Bipedal Motion Skills using Neural Radiance Fields

It is demonstrated that this system can be used to learn vision-based whole body navigation and ball pushing policies for a 20 degrees of freedom humanoid robot with an actuated head-mounted RGB camera, and to transfer these policies to a real robot.

TransNet: Category-Level Transparent Object Pose Estimation

A two-stage pipeline that learns to estimate category-level transparent object pose using localized depth completion and surface normal estimation, and demonstrates that TransNet achieves improved pose estimation accuracy on transparent objects and key findings from the included ablation studies suggest future directions for performance improvements.

NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

This work proposes a transformer- based framework NeRF-Loc to extract 3D bounding boxes of objects in NeRF scenes and designs a pair of paralleled transformer encoder branches to encode both the context and details of target objects.

Implicit Object Mapping With Noisy Data

This paper uses the outputs of an object-based SLAM system to bound objects in the scene with coarse primitives and – in concert with instance masks – identify obstructions in the training images to show that object- based NeRFs are robust to pose variations but sensitive to the quality of the instance masks.

AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains

A new methodology for grasp perception to enable robots to grasp as robustly as humans, and develops a dense supervision strategy with real perception and analytic labels in the spatial-temporal domain.

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.



Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation

ClearGrasp is substantially better than monocular depth estimation baselines and is capable of generalizing to real-world images and novel objects and can be applied out-of-the-box to improve grasping algorithms’ performance on transparent objects.

LIT: Light-Field Inference of Transparency for Refractive Object Localization

It is demonstrated that LIT can outperform both state-of-the-art end-to-end pose estimation methods and a generative pose estimator on transparent objects.

RGB-D Local Implicit Function for Depth Completion of Transparent Objects

A new approach for depth completion of transparent objects from a single RGB-D image using a local implicit neural representation built on ray-voxel pairs that allows the method to generalize to unseen objects and achieve fast inference speed.

Deep learning for detecting robotic grasps

This work presents a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second, and shows that this method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.

Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery

A new approach that combines recent advances in learnt object detectors with perceptual grouping in 2D, and projective geometry of apparent contours in 3D is introduced and results comparable to category-based detection and localization of opaque objects without any training on the object shape are shown.

Vision-Only Robot Navigation in a Neural Radiance World

An algorithm for navigating a robot through a 3D environment represented as a NeRF using only an on-board RGB camera for localization and an optimization based filtering method to estimate 6DoF pose and velocities for the robot in the NeRF given only an onboard RGB camera are proposed.

Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter

The proposed Volumetric Grasping Network (VGN) accepts a Truncated Signed Distance Function (TSDF) representation of the scene and directly outputs the predicted grasp quality and the associated gripper orientation and opening width for each voxel in the queried 3D volume.

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry.

Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes

This work proposes an end-to-end network that efficiently generates a distribution of 6-DoF parallel-jaw grasps directly from a depth recording of a scene and treats 3D points of the recorded point cloud as potential grasp contacts, and reduces the dimensionality of the grasp representation to 4- doF which greatly facilitates the learning process.

Depth-supervised NeRF: Fewer Views and Faster Training for Free

This work formalizes the above assumption through DS-NeRF (Depth-supervised Neural Radiance Fields), a loss for learning radiance fields that takes advantage of readily-available depth supervision and can render better images given fewer training views while training 2-3x faster.