Corpus ID: 232270024

SparsePoint: Fully End-to-End Sparse 3D Object Detector

  title={SparsePoint: Fully End-to-End Sparse 3D Object Detector},
  author={Zili Liu and Guodong Xu and Honghui Yang and Haifeng Liu and Deng Cai},
Object detectors based on sparse object proposals have recently been proven to be successful in the 2D domain, which makes it possible to establish a fully end-to-end detector without time-consuming post-processing. This development is also attractive for 3D object detectors. However, considering the remarkably larger search space in the 3D domain, whether it is feasible to adopt the sparse method in the 3D object detection setting is still an open question. In this paper, we propose… Expand
1 Citations
Multi-Modality Task Cascade for 3D Object Detection
A novel MultiModality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes and shows that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance. Expand


A Hierarchical Graph Network for 3D Object Detection on Point Clouds
A new graph convolution (GConv) based hierarchical graph network (HGNet) for 3D object detection, which processes raw point clouds directly to predict 3D bounding boxes and outperforms state-of-the-art methods on two large-scale point cloud datasets. Expand
H3DNet: 3D Object Detection Using Hybrid Geometric Primitives
This work introduces H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels, and shows how to convert the predicted geometric primitives into object proposals by defining a distance function between an object and the geometricPrimitives. Expand
Deep Hough Voting for 3D Object Detection in Point Clouds
This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Expand
Matterport3D: Learning from RGB-D Data in Indoor Environments
Matterport3D is introduced, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400RGB-D images of 90 building-scale scenes that enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification. Expand
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
A hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set and proposes novel set learning layers to adaptively combine features from multiple scales to learn deep point set features efficiently and robustly. Expand
3D Semantic Parsing of Large-Scale Indoor Spaces
This paper argues that identification of structural elements in indoor spaces is essentially a detection problem, rather than segmentation which is commonly used, and proposes a method for semantic parsing the 3D point cloud of an entire building using a hierarchical approach. Expand
SUN RGB-D: A RGB-D scene understanding benchmark suite
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias. Expand
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks. Expand
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference, can achieve better performance than DETR (especially on small objects) with 10$\times less training epochs. Expand
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on the challenging COCO dataset, e.g., achieving 44.5 AP in standard $3\times$ training schedule and running at 22 fps using ResNet-50 FPN model. Expand