Frustum VoxNet for 3D object detection from RGB-D or Depth images

@article{Shen2020FrustumVF,
  title={Frustum VoxNet for 3D object detection from RGB-D or Depth images},
  author={Xiaoke Shen and Ioannis Stamos},
  journal={2020 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2020},
  pages={1687-1695}
}
  • Xiaoke Shen, I. Stamos
  • Published 12 October 2019
  • Computer Science
  • 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
Recently, there have been a plethora of classification and detection systems from RGB as well as 3D images. In this work, we describe a new 3D object detection system from an RGB-D or depth-only point cloud. Our system first detects objects in 2D (either RGB, or pseudo-RGB constructed from depth). The next step is to detect 3D objects within the 3D frustums these 2D detections define. This is achieved by voxelizing parts of the frustums (since frustums can be really large), instead of using the… 
3D Object Detection and Instance Segmentation from 3D Range and 2D Color Images †
TLDR
A 3D convolutional-based system, named Frustum VoxNet, that generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D Convolutional neural network based on these candidates voxalized images to perform the 3D instance segmentation and object detection.
End-to-end 3 D object detection with Machine Learning
TLDR
This paper proposes an end-to-end classic Machine Learning (ML) pipeline to solve the 3D object detection problem within cars and leveraged on the use of frustum region proposals to segment and estimate the parameters of the amodal 3D bouning box.
A survey of Object Classification and Detection based on 2D/3D data
TLDR
The described systems are organized by application scenarios, data representation methods and main tasks addressed, and critical 2D based systems which greatly influence the 3D ones are also introduced to show the connection between them.
High-level camera-LiDAR fusion for 3D object detection with machine learning
TLDR
This framework uses a Machine Learning (ML) pipeline on a combination of monocular camera and LiDAR data to detect vehicles in the surrounding 3D space of a moving platform to demonstrate an efficient and accurate inference on a validation set.
FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection
TLDR
A frustum-aware geometric reasoning (FGR) method to detect vehicles in point clouds without any 3D annotations and reaches comparable performance with fully supervised methods on the KITTI dataset.
simCrossTrans: A Simple Cross-Modality Transfer Learning for Object Detection with ConvNets or Vision Transformers
TLDR
This work study CMTL from 2D to 3D sensor to explore the upper bound performance of3D sensor only systems, which play critical roles in robotic navigation and perform well in low light scenarios and name the approach simCrossTrans: simple cross-modality transfer learning with ConvNets or ViTs.
MLVSNet: Multi-level Voting Siamese Network for 3D Visual Tracking
TLDR
A Multi-level Voting Siamese Network (MLVSNet) for 3D visual tracking from outdoor point cloud sequences to deal with sparsity in outdoor 3D point clouds and an efficient and lightweight Target-Guided Attention (TGA) module to transfer the target information and highlight the target points in the search area is proposed.
Efficient and accurate object detection for 3D point clouds in intelligent visual internet of things
TLDR
The monocular camera, RGB-D image and LiDAR point cloud are divided into the main data of the network model, and further subdivides according to the different use methods of the model, to provide a more comprehensive understanding of the security and efficiency development of driverless technology.
Survey and Systematization of 3D Object Detection Models and Methods
TLDR
A survey of recent developments in 3D object detection covering the full pipeline from input data, over data representation and feature extraction to the actual detection modules and a systematization which offers a practical framework to compare those approaches on the methods level is proposed.
...
1
2
...

References

SHOWING 1-10 OF 36 REFERENCES
Frustum PointNets for 3D Object Detection from RGB-D Data
TLDR
This work directly operates on raw point clouds by popping up RGBD scans and leverages both mature 2D object detectors and advanced 3D deep learning for object localization, achieving efficiency as well as high recall for even small objects.
2D-Driven 3D Object Detection in RGB-D Images
TLDR
The approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques.
Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images
  • S. Song, Jianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.
Deep Hough Voting for 3D Object Detection in Point Clouds
TLDR
This work proposes VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting that achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency.
Multi-view 3D Object Detection Network for Autonomous Driving
TLDR
This paper proposes Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes and designs a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths.
A survey of Object Classification and Detection based on 2D/3D data
TLDR
The described systems are organized by application scenarios, data representation methods and main tasks addressed, and critical 2D based systems which greatly influence the 3D ones are also introduced to show the connection between them.
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
TLDR
A new geocentric embedding is proposed for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity to facilitate the use of perception in fields like robotics.
SUN RGB-D: A RGB-D scene understanding benchmark suite
TLDR
This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
  • Yin Zhou, Oncel Tuzel
  • Computer Science, Environmental Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
VoxelNet is proposed, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network and learns an effective discriminative representation of objects with various geometries, leading to encouraging results in3D detection of pedestrians and cyclists.
Joint 3D Proposal Generation and Object Detection from View Aggregation
TLDR
This work presents AVOD, an Aggregate View Object Detection network for autonomous driving scenarios that uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network.
...
1
2
3
4
...