Semantic Scene Completion from a Single Depth Image

  title={Semantic Scene Completion from a Single Depth Image},
  author={Shuran Song and Fisher Yu and Andy Zeng and Angel X. Chang and Manolis Savva and Thomas A. Funkhouser},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  • S. SongF. Yu T. Funkhouser
  • Published 28 November 2016
  • Computer Science
  • 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. [] Key Method To leverage the coupled nature of these two tasks, we introduce the semantic scene completion network (SSCNet), an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum.

EdgeNet: Semantic Scene Completion from a Single RGB- D Image

This paper presents EdgeNet, a new end-to-end neural network architecture that fuses information from depth and RGB, explicitly representing RGB edges in 3D space, which improves semantic completion scores especially in hard to detect classes.

Semantic Scene Completion via Integrating Instances and Scene in-the-Loop

This work presents a novel framework named Scene-Instance-Scene Network (SISNet), which takes advantages of both in-stance and scene level semantic information, and is capable of inferring fine-grained shape details as well as nearby objects whose semantic categories are easily mixed-up.

View-Volume Network for Semantic Scene Completion from a Single Depth Image

A View-Volume convolutional neural network (VVNet) for inferring the occupancy and semantic labels of a volumetric 3D scene from a single depth image and demonstrates its efficiency and effectiveness on both synthetic SUNCG and real NYU dataset.

Semantic Scene Completion Using Local Deep Implicit Functions on LiDAR Data

A scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion with superior performance on the Semantic KITTI Scene Completion Benchmark in terms of geometric completion intersection-over-union (IoU).

Two Stream 3D Semantic Scene Completion

This work proposes a two stream approach that leverages depth information and semantic information, which is inferred from the RGB image, for this task and substantially outperforms the state-of-the-art for semantic scene completion.

Semantic Scene Completion Combining Colour and Depth: preliminary experiments

The potential of the RGB colour channels to improve SSCnet is investigated, a method that performs scene completion and semantic labelling in a single end-to-end 3D convolutional network.

ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image

We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic

Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors

SPAwN is presented, a novel lightweight multimodal 3D deep CNN that seamlessly fuses structural data from the depth component of RGB-D images with semantic priors from a bimodal 2D segmentation network.

Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion

This work proposes a novel deep learning framework, named Cascaded Context Pyramid Network (CCPNet), to jointly infer the occupancy and semantic labels of a volumetric 3D scene from a single depth image, and improves the labeling coherence with a cascaded context pyramid.



3D Scene Understanding by Voxel-CRF

A new method is proposed that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the3D reconstruction by introducing a new model which is called Voxel-CRF.

SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

This work focuses its attention on depth based semantic per-pixel labelling as a scene understanding problem and shows the potential of computer graphics to generate virtually unlimited labelled data from synthetic 3D scenes by carefully synthesizing training data with appropriate noise models.

Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling

An adaptive multi-resolution formulation of semantic 3D reconstruction which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions.

RGB-(D) scene labeling: Features and algorithms

The main objective is to empirically understand the promises and challenges of scene labeling with RGB-D and adapt the framework of kernel descriptors that converts local similarities (kernels) to patch descriptors to capture appearance (RGB) and shape (D) similarities.

Structured Prediction of Unobserved Voxels from a Single Depth Image

This work proposes an algorithm that can complete the unobserved geometry of tabletop-sized objects, based on a supervised model trained on already available volumetric elements, that maps from a local observation in a single depth image to an estimate of the surface shape in the surrounding neighborhood.

Predicting Complete 3D Models of Indoor Scenes

This paper aims to interpret indoor scenes from one RGBD image, generating sets of potential object regions, matching to regions in training images, and transferring and aligning associated 3D models while encouraging fit to observations and overall consistency.

Joint 3D Object and Layout Inference from a Single RGB-D Image

This work proposes a high-order graphical model and jointly reason about the layout, objects and superpixels in the image and demonstrates that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

Aligning 3D models to RGB-D images of cluttered scenes

This work first detecting and segmenting object instances in the scene and then using a convolutional neural network to predict the pose of the object, which is trained using pixel surface normals in images containing renderings of synthetic objects.

Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

This work proposes algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information and shows how this contextual information in turn improves object recognition.

Holistic Scene Understanding for 3D Object Detection with RGBD Cameras

A holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects, and develops a conditional random field to integrate information from different sources to classify the cuboids is proposed.