• Corpus ID: 246015405

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

  title={AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection},
  author={Zehui Chen and Zhenyu Li and Shiquan Zhang and Liangji Fang and Qinghong Jiang and Feng Zhao and Bolei Zhou and Hang Zhao},
Object detection through either RGB images or the LiDAR point clouds has been extensively explored in autonomous driving. However, it remains chal-lenging to make these two data sources comple-mentary and beneficial to each other. In this pa-per, we propose AutoAlign , an automatic feature fusion strategy for 3D object detection. Instead of establishing deterministic correspondence with camera projection matrix, we model the mapping relationship between the image and point clouds with a… 

Figures and Tables from this paper

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes and establishes the new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% highermIoU on BEV map segmentation, with 1.9 × lower computation cost.
Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection
Graph-DETR3D is proposed to automatically aggregate multi-view imagery information through graph structure learning (GSL) and constructs a dynamic 3D graph between each object query and 2D feature maps to enhance the object representations, especially at the border regions.
3D Object Detection for Autonomous Driving: A Review and New Outlooks
This paper conducts a comprehensive survey of the progress in 3D object detection from the aspects of models and sensory inputs, including LiDAR-based, camera- based, and multi-modal detection approaches and provides an in-depth analysis of the potentials and challenges in each category of methods.


PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module
A novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points and a 3D multi-Sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks.
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection
A novel fusion module is proposed to enhance the point features with semantic image features in a point-wise manner without any image annotations to address two critical issues in the 3D detection task, including the exploitation of multiple sensors~ and the inconsistency between the localization and classification confidence.
Joint 3D Proposal Generation and Object Detection from View Aggregation
This work presents AVOD, an Aggregate View Object Detection network for autonomous driving scenarios that uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network.
3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection
In this paper, we propose a new deep architecture for fusing camera and LiDAR sensors for 3D object detection. Because the camera and LiDAR sensor signals have different characteristics and
Multi-view 3D Object Detection Network for Autonomous Driving
This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
  • Yin Zhou, Oncel Tuzel
  • Computer Science, Environmental Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VoxelNet is proposed, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network and learns an effective discriminative representation of objects with various geometries, leading to encouraging results in3D detection of pedestrians and cyclists.
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
  • Su Pang, D. Morris, H. Radha
  • Computer Science
    2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2020
A novel Camera-LiDAR Object Candidates (CLOCs) fusion network that provides a low-complexity multi-modal fusion framework that significantly improves the performance of single-modality detectors.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection
This paper proposes an anchor-free single-stage LiDAR-based 3D object detector – RangeDet, and designs three components to address two issues overlooked by previous works: the scale variation between nearby and far away objects and the inconsistency between the 2D range image coordinates used in feature extraction and the 3D Cartesian coordinate used in output.
PointPainting: Sequential Fusion for 3D Object Detection
PointPainting is proposed, a sequential fusion method that combines lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point, and how latency can be minimized through pipelining.