PLUMENet: Efficient 3D Object Detection from Stereo Images

  title={PLUMENet: Efficient 3D Object Detection from Stereo Images},
  author={Yan Wang and Binh Yang and Rui Hu and Ming Liang and Raquel Urtasun},
  journal={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  • Yan Wang, Binh Yang, R. Urtasun
  • Published 17 January 2021
  • Computer Science
  • 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
3D object detection is a key component of many robotic applications such as self-driving vehicles. While many approaches rely on expensive 3D sensors such as LiDAR to produce accurate 3D estimates, methods that exploit stereo cameras have recently shown promising results at a lower cost. Existing approaches tackle this problem in two steps: first depth estimation from stereo images is performed to produce a pseudo LiDAR point cloud, which is then used as input to a 3D object detector. However… 
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that the stereo modeling approach, DSGN++, consistently outperforms other camera-based 3D detectors for all categories.
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
This work proposes LIGA-Stereo (LiDAR Geometry Aware Stereo Detector) to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models, and finds existing voxel-based stereo detectors failed to learn semantic features effectively from indirect 3D supervisions.
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
This work proposes a disparity-wise dynamic convolution with dynamic kernels sampled from the disparity feature map to filter the features adaptively from a single image for generating virtual image features, which eases the feature degradation caused by the depth estimation errors.
Stereo Neural Vernier Caliper
A new object-centric framework for learning- based stereo 3D object detection that achieves state-of-the-art performance on the KITTI benchmark is proposed.
Joint stereo 3D object detection and implicit surface reconstruction
This work proposes a new instance-level network S-3D-RCNN that addresses the unseen surface hallucination problem by extracting point-based representations from stereo regionof-interests, and infers implicit shape codes with predicted complete surface geometry.
SGM3D: Stereo Guided Monocular 3D Object Detection
A stereo-guided monocular 3D object detection framework, dubbed SGM3D, adapting the robust 3D features learned from stereo inputs to enhance the feature for monocular detection is proposed, and an IoU matching-based alignment method for object-level domain adaptation between the stereo and monocular predictions is introduced to alleviate the mismatches while adopting the MG-DA.
Scalable Primitives for Generalized Sensor Fusion in Autonomous Vehicles
This work proposes a new end to end architecture, Generalized Sensor Fusion (GSF), which is designed in such a way that both sensor inputs and target tasks are modular and modifiable, which paves the way for the industry to jointly design hardware and software architectures as well as large fleets with heterogeneous configurations.
VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion
The VPFNet is presented—a new architecture that cleverly aligns and aggregates the point cloud and image data at the ‘virtual’ points, and can nicely bridge the resolution gap between the two sensors, and thus preserve more information for processing.


Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving
This paper proposes to convert image-based depth maps to pseudo-LiDAR representations --- essentially mimicking the LiDAR signal, and achieves impressive improvements over the existing state-of-the-art in image- based performance.
Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving
This paper provides substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation, and proposes a depth-propagation algorithm, guided by the initial depth estimates, to diffuse these few exact measurements across the entire depth map.
3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection
This paper employs a convolutional neural net that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose and outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes.
Confidence Guided Stereo 3D Object Detection with Split Depth Estimation
CG-Stereo is proposed, a confidence-guided stereo 3D object detection pipeline that uses separate decoders for foreground and background pixels during depth estimation, and leverages the confidence estimation from the depth estimation network as a soft attention mechanism in the3D object detector.
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
This work leverages the off-the-shelf 2D object detector to efficiently obtain a coarse cuboid for each predicted 2D box and explores the 3D structure information of the object by employing the visual features of visible surfaces.
PIXOR: Real-time 3D Object Detection from Point Clouds
PIXOR is proposed, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions that surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at 10 FPS.
End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection
A new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end and is compatible with most state-of-the-art networks for both tasks and in combination with PointRCNN improves over PL consistently across all benchmarks.
Multi-view 3D Object Detection Network for Autonomous Driving
This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation
  • Jiaming Sun, Linghao Chen, H. Bao
  • Computer Science, Environmental Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
It is proposed to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes the Disp R-CNN system more widely applicable.
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection
A novel framework named ZoomNet for stereo imagery-based 3D detection, which surpasses all previous state-of-the-art methods by large margins and introduces to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the3D detection quality.