Object Proposals Estimation in Depth Image Using Compact 3D Shape Manifolds

@inproceedings{Zheng2015ObjectPE,
  title={Object Proposals Estimation in Depth Image Using Compact 3D Shape Manifolds},
  author={Shuai Zheng and Victor Adrian Prisacariu and Melinos Averkiou and Ming-Ming Cheng and Niloy Jyoti Mitra and Jamie Shotton and Philip H. S. Torr and Carsten Rother},
  booktitle={GCPR},
  year={2015}
}
Man-made objects, such as chairs, often have very large shape variations, making it challenging to detect them. In this work we investigate the task of finding particular object shapes from a single depth image. We tackle this task by exploiting the inherently low dimensionality in the object shape variations, which we discover and encode as a compact shape space. Starting from any collection of 3D models, we first train a low dimensional Gaussian Process Latent Variable Shape Space. We then… 

DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation

TLDR
This paper proposes a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from stereo images of road scenes and demonstrates that the method significantly improves accuracy for several recent detection approaches.

Deep Structured Models for Large Scale Object Co-detection and Segmentation

TLDR
This thesis introduces a principled formulation for object co-detection using a fullyconnected conditional random field (CRF), and designs a weighted mixture of Gaussian kernels for class-specific object similarity, and forms kernel weights estimation as a least-squares regression problem.

Semantic context and depth-aware object proposal generation

TLDR
This paper presents a context-aware object proposal generation method for stereo images that significantly improves the quality of the initial proposals and achieves the state-of-the-art performance using only a fraction of original object candidates.

Scaling CNNs for High Resolution Volumetric Reconstruction from a Single Image

TLDR
This work presents a scalable 2-D single view to3-D volume reconstruction deep learning method, where the 3-D (deconvolution) decoder is replaced by a simple inverse discrete cosine transform (IDCT) decode.

Learning 3D Shape Completion Under Weak Supervision

TLDR
This work proposes a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision and outperforms the data-driven approach of Engelmann et al.

Learning to Co-Generate Object Proposals with a Deep Structured Network

TLDR
A deep structured network is introduced that jointly predicts the objectness scores and the bounding box locations of multiple object candidates and develops an end-to-end learning algorithm that lets us backpropagate the loss gradient throughout the entire structured network.

Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors

TLDR
This work proposes a novel approach for using compact shape manifolds of the shape within an object class for object segmentation, pose and shape estimation and demonstrates that the shape manifold alignment method yields improved results over the initial stereo reconstruction and object detection method in depth and pose accuracy.

Depth-aware layered edge for object proposal

TLDR
A novel object proposal method for RGB-D images based on layered edges is proposed, which can effectively eliminate the influence of the mixture of edges from objects and background and improve the accuracy of proposals.

Richer Convolutional Features for Edge Detection

TLDR
The proposed network fully exploits multiscale and multilevel information of objects to perform the image-to-image prediction by combining all the meaningful convolutional features in a holistic manner and achieves state-of-the-art performance on several available datasets.

Richer Convolutional Features for Edge Detection

TLDR
RCF encapsulates all convolutional features into more discriminative representation, which makes good usage of rich feature hierarchies, and is amenable to training via backpropagation, and achieves state-of-the-art performance on several available datasets.

References

SHOWING 1-10 OF 41 REFERENCES

Sliding Shapes for 3D Object Detection in Depth Images

TLDR
This paper proposes to use depth maps for object detection and design a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, self-occlusion and sensor noises.

Real-time human pose recognition in parts from single depth images

TLDR
This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.

Simultaneous Monocular 2D Segmentation, 3D Pose Recovery and 3D Reconstruction

TLDR
The solution is to learn nonlinear and probabilistic low dimensional latent spaces, using the Gaussian Process Latent Variable Models dimensionality reduction technique, which acts as class or activity constraints to a simultaneous and variational segmentation --- recovery --- reconstruction process.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

TLDR
A new geocentric embedding is proposed for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity to facilitate the use of perception in fields like robotics.

BING: Binarized normed gradients for objectness estimation at 300fps

TLDR
To improve localization quality of the proposals while maintaining efficiency, a novel fast segmentation method is proposed and demonstrated its effectiveness for improving BING’s localization performance, when used in multi-thresholding straddling expansion (MTSE) post-processing.

Object discovery in 3D scenes via shape analysis

We present a method for discovering object models from 3D meshes of indoor environments. Our algorithm first decomposes the scene into a set of candidate mesh segments and then ranks each segment

Aligning 3D models to RGB-D images of cluttered scenes

TLDR
This work first detecting and segmenting object instances in the scene and then using a convolutional neural network to predict the pose of the object, which is trained using pixel surface normals in images containing renderings of synthetic objects.

Dense Reconstruction Using 3D Object Shape Priors

TLDR
A formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction, and automatically augments the SLAM system with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining image data and depth information for the pose and shape recovery.

Multi-View Priors for Learning Detectors from Sparse Viewpoint Data

TLDR
This model represents prior distributions over permissible multi-view detectors in a parametric way -- the priors are learned once from training data of a source object class, and can later be used to facilitate the learning of a detector for a target class.