• Corpus ID: 229363751

Multi-Modality Cut and Paste for 3D Object Detection

  title={Multi-Modality Cut and Paste for 3D Object Detection},
  author={Wenwei Zhang and Zhe Wang and Chen Change Loy},
Three-dimensional (3D) object detection is essential in autonomous driving. There are observations that multimodality methods based on both point cloud and imagery features perform only marginally better or sometimes worse than approaches that solely use single-modality point cloud. This paper investigates the reason behind this counter-intuitive phenomenon through a careful comparison between augmentation techniques used by singlemodality and multi-modality methods. We found that existing… 

Figures and Tables from this paper

VIN: Voxel-based Implicit Network for Joint 3D Object Detection and Segmentation for Lidars
A neural network structure for joint 3D object detection and point cloud segmentation that leverages rich supervision from both detection and segmentation labels rather than using just one of them and achieves competitive results against state-of-the-art methods.
EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object Detection
EPNet++ is proposed for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion module and a Multi-Modal Consistency (MC) loss, which guarantees the consistency between predicted scores from two modalities to obtain more comprehensive and reliable confidence scores.
Exploring 2D Data Augmentation for 3D Monocular Object Detection
This paper evaluates existing 2D data augmentations and proposes two novel augmentations for monocular 3D detection without a requirement for novel view synthesis for RTM3D detection model due to the shorter training times.
Multimodal Virtual Point 3D Detection
This work presents an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition, and shows that this framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches.
About the Ambiguity of Data Augmentation for 3D Object Detection in Autonomous Driving
It is shown that the positive effects of different data augmentation methods are not so clear-cut and instead depend strongly on the network architecture and the dataset.
Multi-Modal 3D Object Detection in Autonomous Driving: a Survey
This survey devotes to review recent fusion-based perception research to bridge the gap and motivate future research on multi-sensor fusion based perception.
Towards Deep Radar Perception for Autonomous Driving: Datasets, Methods, and Challenges
A big picture of the deep radar perception stack is provided, including signal processing, datasets, labelling, data augmentation, and downstream tasks such as depth and velocity estimation, object detection, and sensor fusion.


PointPainting: Sequential Fusion for 3D Object Detection
PointPainting is proposed, a sequential fusion method that combines lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point, and how latency can be minimized through pipelining.
Multi-view 3D Object Detection Network for Autonomous Driving
This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
A novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module that boosts the accuracy of localization without excessive computation cost.
3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection
This paper employs a convolutional neural net that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose and outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes.
ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes
This work builds on top of VoteNet and proposes a 3D detection architecture called ImVoteNet specialized for RGB-D scenes, based on fusing 2D votes in images and 3D Votes in point clouds, advancing state-of-the-art results by 5.7 mAP.
Deep Hough Voting for 3D Object Detection in Point Clouds
This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.
3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection
In this paper, we propose a new deep architecture for fusing camera and LiDAR sensors for 3D object detection. Because the camera and LiDAR sensor signals have different characteristics and
STD: Sparse-to-Dense 3D Object Detector for Point Cloud
This work proposes a two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD), and implements a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance.
Joint 3D Proposal Generation and Object Detection from View Aggregation
This work presents AVOD, an Aggregate View Object Detection network for autonomous driving scenarios that uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network.