Sliding Shapes for 3D Object Detection in Depth Images

  title={Sliding Shapes for 3D Object Detection in Depth Images},
  author={Shuran Song and Jianxiong Xiao},
The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. [] Key Method We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classifier. During testing and hard-negative mining, we slide a 3D detection window in 3D space. Experiment results show that our…

Single Multi-feature detector for Amodal 3D Object Detection in RGB-D Images

This paper proposes a single end-to-end framework based on the deep neural networks which hierarchically incorporates appearance and geometric features from 2.5D representation to 3D objects for fast and high-accuracy amodal 3D object detections in RGB-D images.

2D-Driven 3D Object Detection in RGB-D Images

The approach makes best use of the 2D information to quickly reduce the search space in 3D, benefiting from state-of-the-art 2D object detection techniques.

2 D-Driven 3 D Object Detection in RGB-D Images

This work hints at the idea that 2D-driven object detection in 3D should be further explored, especially in cases where the 3D input is sparse.

3D Object Detection Incorporating Instance Segmentation and Image Restoration

A 3D object detection approach based on instance segmentation and image restoration based on the Criminisi Algorithm that improves the average precision score compared with the F-PointNet method.

2.5D-VoteNet: Depth Map based 3D Object Detection for Real-Time Applications

2.5D-VoteNet is proposed, a powerful and efficient depth map based 3D detection pipeline that achieves state-of-the-art results on the challenging SUN RGB-D benchmark and surpasses the baseline with a clear margin on ScanNet frame-level detection task.

Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

  • S. SongJianxiong Xiao
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This work proposes the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D.

Geometry-Based Region Proposals for Real-Time Robot Detection of Tabletop Objects

We present a novel object detection pipeline for localization and recognition in three dimensional environments. Our approach makes use of an RGB-D sensor and combines state-of-the-art techniques

Exploiting Depth From Single Monocular Images for Object Detection and Semantic Segmentation

This paper exploits the recent success of depth estimation from monocular images and learns a deep depth estimation model, and proposes an RGB-D semantic segmentation method, which applies a multi-task training scheme: semantic label prediction and depth value regression.



RGB-(D) scene labeling: Features and algorithms

The main objective is to empirically understand the promises and challenges of scene labeling with RGB-D and adapt the framework of kernel descriptors that converts local similarities (kernels) to patch descriptors to capture appearance (RGB) and shape (D) similarities.

Convolutional-Recursive Deep Learning for 3D Object Classification

This work introduces a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images, which obtains state of the art performance on a standardRGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs.

A learned feature descriptor for object recognition in RGB-D data

A new, learned, local feature descriptor for RGB-D images, the convolutional k-means descriptor, which automatically learns feature responses in the neighborhood of detected interest points and is able to combine all available information, such as color and depth into one, concise representation.

Depth kernel descriptors for object recognition

A set of kernel features on depth images that model size, 3D shape, and depth edges in a single framework that significantly improve the capabilities of depth and RGB-D (color+depth) recognition, achieving 10–15% improvement in accuracy over the state of the art.

Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

This work proposes algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information and shows how this contextual information in turn improves object recognition.

3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction

This work proposes to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network, and naturally supports object recognition from 2.5D depth map and also view planning for object recognition.

A textured object recognition pipeline for color and depth image data

We present an object recognition system which leverages the additional sensing and calibration information available in a robotics setting together with large amounts of training data to build high

Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses

A novel framework is proposed that explores the compatibility between segmentation hypotheses of the object in the image and the corresponding 3D map using a generalization of the structural latent SVM formulation in 3D as well as the definition of a new loss function defined over the 3D space in training.

Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

A holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects, and develops a conditional random field to integrate information from different sources to classify the cuboids is proposed.

Semantic Labeling of 3D Point Clouds for Indoor Scenes

This paper proposes a graphical model that captures various features and contextual relations, including the local visual appearance and shape cues, object co-occurence relationships and geometric relationships, and applies these algorithms successfully on a mobile robot for the task of finding objects in large cluttered rooms.