PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

  title={PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding},
  author={Saining Xie and Jiatao Gu and Demi Guo and C. Qi and Leonidas J. Guibas and Or Litany},
Arguably one of the top success stories of deep learning is transfer learning. The finding that pre-training a network on a rich source set (eg., ImageNet) can help boost performance once fine-tuned on a usually much smaller target set, has been instrumental to many applications in language and vision. Yet, very little is known about its usefulness in 3D point cloud understanding. We see this as an opportunity considering the effort required for annotating data in 3D. In this work, we aim at… 

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization

Self-Supervised Pretraining of 3D Features on any Point-Cloud

This work presents a simple self-supervised pretraining method that can work with single-view depth scans acquired by varied sensors, without 3D registration and point correspondences, and sets a new state-of-the-art for object detection on ScanNet and SUNRGBD.

Pre-Training by Completing Point Clouds

It is demonstrated that OcCo learns representations that improve the semantic understandings as well as generalization on downstream tasks over prior methods, transfer to different datasets, reduce training time and improve label efficiency.

Pri3D: Can 3D Priors Help 2D Representation Learning?

This work proposes to employ contrastive learning under both multi-view image constraints and image-geometry constraints to encode 3D priors into learned 2D representations, which results in improvement over 2D-only representation learning on the image-based tasks of semantic segmentation, instance segmentation and object detection on real-world indoor datasets.

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

This study reveals that exhaustive labelling of 3D point clouds might be unnecessary; and remarkably, on ScanNet, even using 0.1% of point labels, the method achieves state-of-the-art results on a suite of benchmarks where training data or labels are scarce.

3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning

A strategy to fine-tune MS-SVConv on unknown datasets in a self-supervised way, which leads to state-of-the-art results on ETH and TUM datasets.

Weakly Supervised Learning of Rigid 3D Scene Flow

A data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies, which enables the requirement for dense scene flow supervision with simpler binary background segmentation mask and ego-motion annotations to be relaxed.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

DenseCL is presented, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images and outperforms the state-of-the-art methods by a large margin.

Label-Efficient Point Cloud Semantic Segmentation: An Active Learning Approach

This work adopts a super-point based active learning strategy where it makes use of manifold defined on the point cloud geometry and achieves significant improvement at all levels of annotation budgets and outperform the state-of-the-art methods under the same level of annotation cost.

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

This paper presents an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points, which is three times the number of labeled points than the existing largest photogrammatric point clouds dataset.



PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

A hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set and proposes novel set learning layers to adaptively combine features from multiple scales to learn deep point set features efficiently and robustly.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.

ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes

This work builds on top of VoteNet and proposes a 3D detection architecture called ImVoteNet specialized for RGB-D scenes, based on fusing 2D votes in images and 3D Votes in point clouds, advancing state-of-the-art results by 5.7 mAP.

Self-Supervised Deep Learning on Point Clouds by Reconstructing Space

This work proposes a self-supervised learning task for deep learning on raw point cloud data in which a neural network is trained to reconstruct point clouds whose parts have been randomly rearranged, and demonstrates that pre-training with this method before supervised training improves the performance of state-of-the-art models and significantly improves sample efficiency.

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

3DMatch is presented, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data that consistently outperforms other state-of-the-art approaches by a significant margin.

ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.

Local Spectral Graph Convolution for Point Set Feature Learning

This article replaces the standard max pooling step with a recursive clustering and pooling strategy, devised to aggregate information from within clusters of nodes that are close to one another in their spectral coordinates, leading to richer overall feature descriptors.

Unsupervised Representation Learning by Predicting Image Rotations

This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning.

FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation

A novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds, and is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid.

Attentional ShapeContextNet for Point Cloud Recognition

The resulting model, called ShapeContextNet, consists of a hierarchy with modules not relying on a fixed grid while still enjoying properties similar to those in convolutional neural networks - being able to capture and propagate the object part information.