3D-FFS: Faster 3D object detection with Focused Frustum Search in sensor fusion based networks

  title={3D-FFS: Faster 3D object detection with Focused Frustum Search in sensor fusion based networks},
  author={Aniruddha Ganguly and Tasin Ishmam and Khandker Aftarul Islam and Md. Zahidur Rahman and Md. Shamsuzzoha Bayzid},
  journal={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  • A. Ganguly, Tasin Ishmam, M. Bayzid
  • Published 15 March 2021
  • Computer Science
  • 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
In this work we propose 3D-FFS, a novel approach to make sensor fusion based 3D object detection networks significantly faster using a class of computationally inexpensive heuristics. Existing sensor fusion based networks generate 3D region proposals by leveraging inferences from 2D object detectors. However, as images have no depth information, these networks rely on extracting semantic features of points from the entire scene to locate the object. By leveraging aggregated intrinsic properties… 

Figures and Tables from this paper


Frustum PointNets for 3D Object Detection from RGB-D Data
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
3D Object Detection Using Scale Invariant and Feature Reweighting Networks
A new network architecture which focuses on utilizing the front view images and frustum point clouds to generate 3D detection results and achieves better performance than the state-of-the-art methods especially when point clouds are highly sparse.
RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement
RoarNet outperforms state-of-the-art methods even in settings where Lidar and camera are not time synchronized, which is practically important for actual driving environment.
Multi-view 3D Object Detection Network for Autonomous Driving
This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.
PIXOR: Real-time 3D Object Detection from Point Clouds
PIXOR is proposed, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions that surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at 10 FPS.
Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal
  • Zhixin Wang, K. Jia
  • Computer Science
    2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2019
A novel method termed Frustum ConvNet (F-ConvNet), which aggregates point-wise features as frustum-level feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN).
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
  • S. Song, Jianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
  • Yin Zhou, Oncel Tuzel
  • Computer Science, Environmental Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VoxelNet is proposed, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network and learns an effective discriminative representation of objects with various geometries, leading to encouraging results in3D detection of pedestrians and cyclists.
Fast Point R-CNN
This work presents a unified, efficient and effective framework for point-cloud based 3D object detection that achieves state-of-the-arts with a 15FPS detection rate.
VoxNet: A 3D Convolutional Neural Network for real-time object recognition
  • Daniel Maturana, S. Scherer
  • Computer Science
    2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • 2015
VoxNet is proposed, an architecture to tackle the problem of robust object recognition by integrating a volumetric Occupancy Grid representation with a supervised 3D Convolutional Neural Network (3D CNN).