RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection

  title={RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection},
  author={Yongming Rao and Benlin Liu and Yi Wei and Jiwen Lu and Cho-Jui Hsieh and Jie Zhou},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
3D point cloud understanding has made great progress in recent years. However, one major bottleneck is the scarcity of annotated real datasets, especially compared to 2D object detection tasks, since a large amount of labor is involved in annotating the real scans of a scene. A promising solution to this problem is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. This can be achieved by the pre-training and fine-tuning… 

Figures and Tables from this paper

Language-Grounded Indoor 3D Semantic Segmentation in the Wild
A language-driven pre-training method to encourage learned 3D features that might have limited training examples to lie close to their pre-trained text embeddings and consistently out-performs state-of-the-art 3D pre- training for 3D semantic segmentation on a proposed benchmark.
Masked Discrimination for Self-Supervised Learning on Point Clouds
This paper proposes a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds, to represent the point cloud as discrete occupancy values, and performs simple binary classification between masked object points and sampled noise points as the proxy task.
Implicit Autoencoder for Point Cloud Self-supervised Representation Learning
Implicit Autoencoder (IAE) is introduced, a simple yet effective method that addresses the challenge of autoencoding on point clouds by replacing the point cloud decoder with an implicit decoder that outputs a continuous representation that is shared among different point cloud sampling of the same model.
Unsupervised Representation Learning for Point Clouds: A Survey
This paper provides a comprehensive review of unsupervised point cloud representation learning using DNNs and quantitatively benchmark and discuss the reviewed methods over multiple widely adopted point cloud datasets.


3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions
3DMatch is presented, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data that consistently outperforms other state-of-the-art approaches by a significant margin.
2D-Driven 3D Object Detection in RGB-D Images
The approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques.
2 D-Driven 3 D Object Detection in RGB-D Images
This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
  • S. Song, Jianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.
Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds
This work hypothesizes that a powerful representation of a 3D object should model the attributes that are shared between parts and the whole object, and distinguishable from other objects, and proposes to learn point cloud representation by bidirectional reasoning between the local structures at different abstraction hierarchies and the global shape without human supervision.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients
A cloud of oriented gradient (COG) descriptor is proposed that links the 2D appearance and 3D pose of object categories, and thus accurately models how perspective projection affects perceived image boundaries, in 3D object detection and spatial layout prediction in cluttered indoor scenes.