Joint 3D Proposal Generation and Object Detection from View Aggregation

  title={Joint 3D Proposal Generation and Object Detection from View Aggregation},
  author={Jason Ku and Melissa Mozifian and Jungwook Lee and Ali Harakeh and Steven L. Waslander},
  journal={2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. [] Key Method The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes.

Figures and Tables from this paper

PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement
This work proposes PointRGCN: a graph-based 3D object detection pipeline based on graph convolutional networks (GCNs) which operates exclusively on 3D LiDAR point clouds and achieves state-of-the-art performance on the easy difficulty for the bird eye view detection task.
VIN: Voxel-based Implicit Network for Joint 3D Object Detection and Segmentation for Lidars
A neural network structure for joint 3D object detection and point cloud segmentation that leverages rich supervision from both detection and segmentation labels rather than using just one of them and achieves competitive results against state-of-the-art methods.
Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection
The Stereo RGB and Deeper LIDAR (SRDL) framework is proposed which can utilize semantic and spatial information simultaneously such that the performance of network for 3D object detection can be improved naturally.
PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud
A new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection and has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods.
Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds
Voxel-Feature Pyramid Network is presented, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only that has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark.
Scanet: Spatial-channel Attention Network for 3D Object Detection
A novel Spatial-Channel Attention Network (SCANet), a two-stage detector that takes both LIDAR point clouds and RGB images as input to generate 3D object estimates, and a new multi-level fusion scheme for accurate classification and 3D bounding box regression is designed.
Improving Deep Multi-modal 3D Object Detection for Autonomous Driving
  • R. Khamsehashari, K. Schill
  • Computer Science
    2021 7th International Conference on Automation, Robotics and Applications (ICARA)
  • 2021
This paper aims at obtaining highly accurate 3D localization and recognition of objects in the road scene and tries to improve the performance of the basic architecture, AVOD-FPN, one of the best among sensor fusion-based methods for 3D object detection.
MapFusion: A General Framework for 3D Object Detection with HDMaps
This paper designs a simple but effective framework - MapFusion to integrate the map information into modern 3D object detector pipelines, and designs a FeatureAgg module for HD Map feature extraction and fusion, and a MapSeg module as an auxiliary segmentation head for the detection backbone.
Multi-level Fusion Network for 3D Object Detection from Camera and LiDAR Data
A two-stage 3D object detection system, which takes input from the camera and LiDAR data, and outputs the localization and category of the 3D bounding box, using a novel feature extractor to learn the full-resolution scale features while keeping the computation speed coupled with a multimodal fusion Region Proposal Network (RPN) architecture.
MLOD: A multi-view 3D object detection based on robust feature fusion method
  • Jian Deng, K. Czarnecki
  • Computer Science
    2019 IEEE Intelligent Transportation Systems Conference (ITSC)
  • 2019
This paper introduces a novel detection header, which provides detection results not just from fusion layer, but also from each sensor channel, and achieves state-of-the-art performance on the KITTI 3D object detection benchmark.


Multi-view 3D Object Detection Network for Autonomous Driving
This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.
3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection
This paper employs a convolutional neural net that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose and outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes.
Monocular 3D Object Detection for Autonomous Driving
This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
3D Object Proposals for Accurate Object Class Detection
This method exploits stereo imagery to place proposals in the form of 3D bounding boxes in the context of autonomous driving and outperforms all existing results on all three KITTI object classes.
Vehicle Detection from 3D Lidar Using Fully Convolutional Network
This paper proposes to present the data in a 2D point map and use a single 2D end-to-end fully convolutional network to predict the objectness confidence and the bounding boxes simultaneously, and shows the state-of-the-art performance of the proposed method.
2D-Driven 3D Object Detection in RGB-D Images
The approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques.
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
  • Yin Zhou, Oncel Tuzel
  • Computer Science, Environmental Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VoxelNet is proposed, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network and learns an effective discriminative representation of objects with various geometries, leading to encouraging results in3D detection of pedestrians and cyclists.
Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks
This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs) by leveraging a feature-centric voting scheme to implement novel convolutionan layers which explicitly exploit the sparsity encountered in the input.
Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
  • S. Song, Jianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.