Masked Discrimination for Self-Supervised Learning on Point Clouds

  title={Masked Discrimination for Self-Supervised Learning on Point Clouds},
  author={Haotian Liu and Mu Cai and Yong Jae Lee},
Masked autoencoding has achieved great success for selfsupervised learning in the image and language domains. However, mask based pretraining has yet to show benefits for point cloud understanding, likely due to standard backbones like PointNet being unable to properly handle the training versus testing distribution mismatch introduced by masking during training. In this paper, we bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds… 
Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey
This paper provides a comprehensive review of unsupervised point cloud representation learning using DNNs and quantitatively benchmark and discuss the reviewed methods over multiple widely adopted point cloud datasets.
A Survey of Visual Transformers
This survey has reviewed over one hundred of different visual Transformers comprehensively according to three fundamental CV tasks and different data stream types, and proposed the deformable attention module which combines the best of the sparse spatial sampling of deformable convo- lution, and the relation modeling capability of Transformers.
Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds
Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and recently, point clouds. Raw automotive datasets are a suitable candidate for


PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
This work aims at facilitating research on 3D representation learning by selecting a suite of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes and achieving improvement over recent best results in segmentation and detection across 6 different benchmarks.
Self-Supervised Learning for Domain Adaptation on Point Clouds
A new family of pretext tasks, Deformation Reconstruction, inspired by the deformations encountered in sim-to-real transformations are introduced, and a novel training procedure for labeled point cloud data motivated by the MixUp method called Point cloud Mixup (PCM).
Unsupervised Point Cloud Pre-training via Occlusion Completion
This paper shows that this method outperforms previous pre-training methods in object classification, and both part-based and semantic segmentation tasks, and even when it pre-train on a single dataset (ModelNet40), improves accuracy across different datasets and encoders.
Self-Supervised Learning of Point Clouds via Orientation Estimation
This paper leverages 3D self-supervision for learning downstream tasks on point clouds with fewer labels and demonstrates that its approach outperforms the state-of-the-art.
Self-Supervised Pretraining of 3D Features on any Point-Cloud
This work presents a simple self-supervised pretraining method that can work with single-view depth scans acquired by varied sensors, without 3D registration and point correspondences, and sets a new state-of-the-art for object detection on ScanNet and SUNRGBD.
Self-Supervised Learning of Local Features in 3D Point Clouds
This work presents a self-supervised task on point clouds, in order to learn meaningful point-wise features that encode local structure around each point, using a multi-layer RNN to predict the next point in a point sequence created by a popular and fast Space Filling Curve, the Morton-order curve.
Implicit Autoencoder for Point Cloud Self-supervised Representation Learning
Implicit Autoencoder (IAE) is introduced, a simple yet effective method that addresses the challenge of autoencoding on point clouds by replacing the point cloud decoder with an implicit decoder that outputs a continuous representation that is shared among different point cloud sampling of the same model.
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
The proposed BERT-style pretraining strategy improves the performance of standard point cloud Transformers and the representations learned by Point-BERT transfer well to new tasks and domains, where the models largely advance the state-of-the-art of few-shot point cloud classification task.
Masked Feature Prediction for Self-Supervised Visual Pre-Training
This work presents Masked Feature Prediction (MaskFeat), which first randomly masks out a portion of the input sequence and then predicts the feature of the masked regions, and finds Histograms of Oriented Gradients (HOG), a hand-crafted feature descriptor, works particularly well in terms of both performance and efficiency.
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.