Delving into Localization Errors for Monocular 3D Object Detection

@article{Ma2021DelvingIL,
  title={Delving into Localization Errors for Monocular 3D Object Detection},
  author={Xinzhu Ma and Yinmin Zhang and Dan Xu and Dongzhan Zhou and Shuai Yi and Haojie Li and Wanli Ouyang},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={4719-4728}
}
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging. In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the ‘localization error’ is the vital factor in restricting monocular 3D detection. Besides, we also investigate the underlying reasons behind localization errors, analyze the issues they might bring, and… 
Homography Loss for Monocular 3D Object Detection
TLDR
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information, aiming at balancing the positional relationships between different objects by global constraints, so as to obtain more accurately predicted 3D boxes.
MonoGround: Detecting Monocular 3D Objects from the Ground
TLDR
The ground plane prior serves as an additional geometric condition to the ill-posed mapping and an extra source in depth estimation in monocular 3D object detection, and can get a more accurate depth estimation from the ground.
WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection
TLDR
The weakly supervised monocular 3D detection method is explored, which first detects 2D boxes on the image and adopts corresponding RoI LiDAR points as the weak supervision, then adopts a network to predict 3D boxes which can tightly align with associated RoILiDar points.
Shape-Aware Monocular 3D Object Detection
TLDR
A single-stage monocular 3D object detection model is proposed with an instance-segmentation head integrated into the model training, which allows the model to be aware of the visible shape of a target object.
ImpDet: Exploring Implicit Fields for 3D Object Detection
TLDR
This work proposes a proposed framework, termed Implicit Detection or ImpDet, which leverages implicit learning for 3D object detection and presents a simple yet effective virtual sampling strategy to solve the problem of sparsity on the object surface.
AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection
TLDR
This work employs the deep neural network to learn distinguished 2D keypoints in the 2D image domain and regress their corresponding 3D coordinates in the local 3D object coordinate first to incorporate the shape-aware 2D/3D constraints into the 3D detection framework.
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
TLDR
A simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase, and can significantly boost the performance of the baseline model.
Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection
TLDR
The proposed MonoCon method is motivated by the Cramèr–Wold theorem in measure theory at a high level and outperforms all prior arts in the leaderboard on the car category and obtains comparable performance on pedestrian and cyclist in terms of accuracy.
3D Object Detection from Images for Autonomous Driving: A Survey
TLDR
This paper provides the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection and deeply analyzing each of their components and proposing two new taxonomies to organize the state-of-the-art methods into different categories.
Exploring Geometric Consistency for Monocular 3D Object Detection
TLDR
The proposed augmentation techniques lead to improvements on the KITTI and nuScenes monocular 3D detection benchmarks with state-of-the-art results and are well suited for semi-supervised training and cross-dataset generalization.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
TLDR
This paper argues that the 2D detection network is redundant and introduces non-negligible noise for 3D detection, and proposes a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables.
MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships
TLDR
This work proposes a novel method to improve the monocular 3D object detection by considering the relationship of paired samples, which allows us to encode spatial constraints for partially-occluded objects from their adjacent neighbors.
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
TLDR
This work leverages the off-the-shelf 2D object detector to efficiently obtain a coarse cuboid for each predicted 2D box and explores the 3D structure information of the object by employing the visual features of visible surfaces.
Disentangling Monocular 3D Object Detection
TLDR
An approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes is proposed.
IoU Loss for 2D/3D Object Detection
TLDR
By integrating the implemented IoU loss into several state-of-the-art 3D object detectors, consistent improvements have been achieved for both bird-eye-view 2D detection and point cloud 3D detection on the public KITTI [3] benchmark.
Monocular 3D Object Detection for Autonomous Driving
TLDR
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
TLDR
This paper proposes a monocular 3D object detection framework in the domain of autonomous driving, and proposes a multi-modal feature fusion module to embed the complementary RGB cue into the generated point clouds representation.
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization
TLDR
This work proposes a novel IDE method that directly predicts the depth of the targeting 3D bounding box's center using sparse supervision, and demonstrates that MonoGRNet achieves state-of-the-art performance on challenging datasets.
Disentangling Monocular 3D Object Detection: From Single to Multi-Class Recognition
TLDR
A method for multi-class, monocular 3D object detection from a single RGB image, which exploits a novel disentangling transformation and a novel, self-supervised confidence estimation method for predicted 3D bounding boxes, demonstrating its ability to generalize for different types of objects.
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
  • G. Brazil, Xiaoming Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
TLDR
M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.
...
...