3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

  title={3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions},
  author={Andy Zeng and Shuran Song and Matthias Nie{\ss}ner and Matthew Fisher and Jianxiong Xiao and Thomas A. Funkhouser},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Andy Zeng, S. Song, T. Funkhouser
  • Published 27 March 2016
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we… 
Learning 3D Keypoint Descriptors for Non-rigid Shape Matching
A novel deep learning framework that derives discriminative local descriptors for 3D surface shapes by leveraging a triplet network to perform deep metric learning, which takes a set of triplets as input and is minimized to distinguish between similar and dissimilar pairs of keypoints.
Learning Local Shape Descriptors from Part Correspondences with Multiview Convolutional Networks
A new local descriptor for 3D shapes is presented, directly applicable to a wide range of shape analysis problems such as point correspondences, semantic segmentation, affordance prediction, and shape-to-scan matching by a convolutional network trained to embed geometrically and semantically similar points close to one another in descriptor space.
Neighborhood Normalization for Robust Geometric Feature Learning
This work introduces a new normalization technique, Batch-Neighborhood Normalization, aiming to improve robustness to mean-std variation of local feature distributions that presumably can happen in samples with varying point density.
WSDesc: Weakly Supervised 3D Local Descriptor Learning for Point Cloud Registration.
This work proposes a novel registration loss based on the deviation from rigidity of 3D transformations, and the loss is weakly supervised by the prior knowledge that the input point clouds have partial overlap, without requiring ground-truth alignment information.
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features
This paper proposes a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and proposes a self-supervised detector loss guided by the on-the-fly feature matching results during training.
Geometric Priors from Robot Vision in Deep Networks for 3 D Object Classification
The benefits of using a deep network to improve on classical histogram-based descriptors are demonstrated and results show competitive accuracy and robustness, while being rotation invariant and using 10-100x less parameters than some competing methods.
Learning Geodesic-Aware Local Features from RGB-D Images
The Perfect Match: 3D Point Cloud Matching With Smoothed Densities
This work proposes 3DSmoothNet, a full workflow to match 3D point clouds with a siamese deep learning architecture and fully convolutional layers using a voxelized smoothed density value (SDV) representation, and shows that 3DS moothNet trained only on RGB-D indoor scenes of buildings achieves 79.0% average recall, more than double the performance of the closest, learning-based competitors.
LCD: Learned Cross-Domain Descriptors for 2D-3D Matching
Experimental results confirm the robustness of the proposed dual auto-encoder neural network approach as well as its competitive performance not only in solving cross-domain tasks but also in being able to generalize to solve sole 2D and 3D tasks.


Self-Supervised Visual Descriptor Learning for Dense Correspondence
A new approach to learning visual descriptors for dense correspondence estimation is advocated in which the power of a strong three-dimensional generative model is harnessed to automatically label correspondences in RGB-D video data.
3D ShapeNets: A deep representation for volumetric shapes
This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and shows that this 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
  • S. Song, Jianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.
Fine-to-Coarse Global Registration of RGB-D Scans
A fine-to-coarse global registration algorithm that leverages robust registrations at finer scales to seed detection and enforcement of new correspondence and structural constraints at coarser scales is proposed.
Discriminative Learning of Deep Convolutional Feature Point Descriptors
This paper uses Convolutional Neural Networks to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches to develop 128-D descriptors whose euclidean distances reflect patch similarity and can be used as a drop-in replacement for any task involving SIFT.
SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places is introduced, and a generalization of bundle adjustment that incorporates object-to-object correspondences is introduced.
Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images
We address the problem of inferring the pose of an RGB-D camera relative to a known 3D scene, given only a single acquired image. Our approach employs a regression forest that is capable of inferring
Real-time 3D reconstruction at scale using voxel hashing
An online system for large and fine scale volumetric reconstruction based on a memory and speed efficient data structure that compresses space, and allows for real-time access and updates of implicit surface data, without the need for a regular or hierarchical grid data structure.
Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge
This paper proposes a self-supervised method to generate a large labeled dataset without tedious manual segmentation and demonstrates that the system can reliably estimate the 6D pose of objects under a variety of scenarios.
Model globally, match locally: Efficient and robust 3D object recognition
A novel method is proposed that creates a global model description based on oriented point pair features and matches that model locally using a fast voting scheme, which allows using much sparser object and scene point clouds, resulting in very fast performance.