Learning Depth-Guided Convolutions for Monocular 3D Object Detection

  title={Learning Depth-Guided Convolutions for Monocular 3D Object Detection},
  author={Mingyu Ding and Yuqi Huo and Hongwei Yi and Zhe Wang and Jianping Shi and Zhiwu Lu and Ping Luo},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  • Mingyu Ding, Yuqi Huo, P. Luo
  • Published 10 December 2019
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this task because they fail to capture local object and its scale information, which are vital for 3D object detection. To better represent 3D structure, prior arts typically transform depth maps estimated from 2D images into a pseudo-LiDAR representation, and then apply existing 3D point-cloud based object detectors. However… 
Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information
This work proposes a novel method based on joint image segmentation and geometric constraints, used to predict the target depth and provide the depth prediction confidence measure, which outperforms various state-of-the-art methods on the challenging KITTI dataset.
Is Pseudo-Lidar needed for Monocular 3D Object detection?
This work proposes an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations.
OCM3D: Object-Centric Monocular 3D Object Detection
It is argued that the local RoI information from the object image patch alone with a proper resizing scheme is a better input as it provides complete semantic clues meanwhile excludes irrelevant interferences and decomposes the confidence mechanism in monocular 3D object detection by considering the relationship between 3D objects and the associated 2D boxes.
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
This work proposes a disparity-wise dynamic convolution with dynamic kernels sampled from the disparity feature map to filter the features adaptively from a single image for generating virtual image features, which eases the feature degradation caused by the depth estimation errors.
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
A simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors, without introducing any extra cost in the inference phase, and can significantly boost the performance of the baseline model.
Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth
A rendering module to augment the training data by synthesizing images with virtual-depths and an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
GAC3D: improving monocular 3D object detection with ground-guide model and adaptive convolution
Monocular 3D object detection has recently become prevalent in autonomous driving and navigation applications due to its cost-efficiency and easy-to-embed to existent vehicles. The most challenging
FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
The solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020 and proposes a general framework FCOS3D, getting rid of any 2D detection or 2D-3D correspondence priors.
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
A novel framework for monocular 3D object detection with a depth-guided TR ansformer, named MonoDETR, which is an end-to-end network without extra data or NMS post-processing and achieves state-of-the-art performance on KITTI benchmark with significant gains.
MonoGRNet: A General Framework for Monocular 3D Object Detection
The task decomposition significantly facilitates the monocular 3D object detection, allowing the target 3D bounding boxes to be efficiently predicted in a single forward pass, without using object proposals, post-processing or the computationally expensive pixel-level depth estimation utilized by previous methods.


Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving
This paper proposes to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal, and achieves impressive improvements over the existing state-of-the-art in image- based performance.
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
This paper proposes a monocular 3D object detection framework in the domain of autonomous driving, and proposes a multi-modal feature fusion module to embed the complementary RGB cue into the generated point clouds representation.
Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud
  • Xinshuo Weng, Kris Kitani
  • Computer Science, Environmental Science
    2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
  • 2019
This work aims at bridging the performance gap between 3D sensing and 2D sensing for 3D object detection by enhancing LiDAR-based algorithms to work with single image input by enhancing pseudo-LiDAR end-to-end methods.
3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection
This paper employs a convolutional neural net that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose and outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes.
Orthographic Feature Transform for Monocular 3D Object Detection
The orthographic feature transform is introduced, which enables us to escape the image domain by mapping image-based features into an orthographic 3D space and allows us to reason holistically about the spatial configuration of the scene in a domain where scale is consistent and distances between objects are meaningful.
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
This work leverages the off-the-shelf 2D object detector to efficiently obtain a coarse cuboid for each predicted 2D box and explores the 3D structure information of the object by employing the visual features of visible surfaces.
MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization
This work proposes a novel IDE method that directly predicts the depth of the targeting 3D bounding box's center using sparse supervision, and demonstrates that MonoGRNet achieves state-of-the-art performance on challenging datasets.
Shift R-CNN: Deep Monocular 3D Object Detection With Closed-Form Geometric Constraints
The novel, geometrically constrained deep learning approach to monocular 3D object detection obtains top results on KITTI 3D Object Detection Benchmark, being the best among all monocular methods that do not use any pre-trained network for depth estimation.