Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

  title={Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection},
  author={Zehui Chen and Zhenyu Li and Shiquan Zhang and Liangji Fang and Qinhong Jiang and Feng Zhao},
3D object detection from multiple image views is a fun-damental and challenging task for visual scene understanding. Due to its low cost and high efficiency, multi-view 3D object detection has demonstrated promising application prospects. However, accurately detecting objects through perspective views in the 3D space is extremely difficult due to the lack of depth information. Recently, DETR3D [45] introduces a novel 3D-2D query paradigm in aggregating multi-view images for 3D object detection… 

Figures and Tables from this paper

Towards Model Generalization for Monocular 3D Object Detection
The 2D-3D geometry-consistent object scaling strategy (GCOS) is proposed to bridge the gap via an instance-level augment and achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme even without utilizing data on the target domain.


DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model.
CenterNet: Keypoint Triplets for Object Detection
This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs, and builds the framework upon a representative one-stage keypoint-based detector named CornerNet, which improves both precision and recall.
Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
  • Li WangLiang Du Li Zhang
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
A depth-conditioned dynamic message propagation (DDMP) network to effectively integrate the multi-scale depth information with the image context and dynamically predicting hybrid depth-dependent filter weights and affinity matrices for propagating information is proposed.
Object DGCNN: 3D Object Detection using Dynamic Graphs
Object DGCNN is introduced, a streamlined architecture for 3D object detection from point clouds that removes the necessity of post-processing via object confidence aggregation or non-maximum suppression and provides a set-to-set distillation approach customized to 3D detection.
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
The BEVDet paradigm is contributed, which performs 3D object detection in Bird-Eye-View (BEV), where most target values are defined and route planning can be handily performed and substantially develops its performance by constructing an exclusive data augmentation strategy and upgrading the Non-Maximum Suppression strategy.
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization
This work proposes a novel IDE method that directly predicts the depth of the targeting 3D bounding box's center using sparse supervision, and demonstrates that MonoGRNet achieves state-of-the-art performance on challenging datasets.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
PointAugmenting decorates point clouds with corresponding point-wise CNN features extracted by pretrained 2D detection models, and then performs 3D object detection over the decorated point clouds and achieves the new state-of-the-art results on the nuScenes leaderboard to date.
3D Object Detection with Pointformer
This paper proposes Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively, and introduces an efficient coordinate refinement module to shift down-sampled points closer to object centroids, which improves object proposal generation.
End-to-End Object Detection with Transformers
This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.