• Corpus ID: 238744328

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

  title={DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries},
  author={Yue Wang and Vitor Campanholo Guizilini and Tianyuan Zhang and Yilun Wang and Hang Zhao and Justin Solomon},
We introduce a framework for multi-camera 3D object detection. In contrast to existing works, which estimate 3D bounding boxes directly from monocular images or use depth prediction networks to generate input for 3D object detection from 2D information, our method manipulates predictions directly in 3D space. Our architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images… 

Figures and Tables from this paper

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View
The BEVDet paradigm, developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance.
SoK: Vehicle Orientation Representations for Deep Rotation Estimation
This work categorize and compare the accuracy performance of various existing orientation representations using the KITTI 3D object detection dataset, and proposes a new form of orientation representation: Tricosine.


Monocular 3D Object Detection via Geometric Reasoning on Keypoints
This paper proposes a novel keypoint-based approach for 3D object detection and localization from a single RGB image, building a multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method.
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
This paper argues that the 2D detection network is redundant and introduces non-negligible noise for 3D detection, and proposes a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables.
Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction
MonopolyPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction, is presented and a novel projection alignment loss is devised to jointly optimize these tasks in the neural network to improve 3D localization accuracy.
3D Bounding Box Estimation Using Deep Learning and Geometry
Although conceptually simple, this method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.
Is Pseudo-Lidar needed for Monocular 3D Object detection?
This work proposes an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations, and achieves state-of-theart results on two challenging benchmarks.
Monocular Differentiable Rendering for Self-Supervised 3D Object Detection
This work presents a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks for 3D object detection from monocular images.
Orthographic Feature Transform for Monocular 3D Object Detection
The orthographic feature transform is introduced, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space and allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.
FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
The solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020 and proposes a general framework FCOS3D, getting rid of any 2D detection or 2D-3D correspondence priors.
Monocular 3D Object Detection for Autonomous Driving
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
SSD: Single Shot MultiBox Detector
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.