3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera

  title={3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera},
  author={Iro Armeni and Zhi-Yang He and JunYoung Gwak and Amir Roshan Zamir and Martin Fischer and Jitendra Malik and Silvio Savarese},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
A comprehensive semantic understanding of a scene is important for many applications - but in what space should diverse semantic information (e.g., objects, scene categories, material types, 3D shapes, etc.) be grounded and what should be its structure? Aspiring to have one unified structure that hosts diverse types of semantics, we follow the Scene Graph paradigm in 3D, generating a 3D Scene Graph. Given a 3D mesh and registered panoramic images, we construct a graph that spans the entire… 

Figures and Tables from this paper

Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions

This work proposes a learned method that regresses a scene graph from the point cloud of a scene, based on PointNet and Graph Convolutional Networks, and introduces 3DSSG, a semiautomatically generated dataset, that contains semantically rich scene graphs of 3D scenes.

A Bottom-up Framework for Construction of Structured Semantic 3D Scene Graph

A bottomup construction framework for structured 3D scene graph generation is proposed, which efficiently describes the objects, relations and attributes of the 3D indoor environment with structured representation and significantly outperforms existing methods in terms of accuracy.

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

This paper presents a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks, and takes advantage of the availability of professional interior designs to automatically extract 3D structures from them.

Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs

This work proposes the first work that directly generates shapes from a scene graph in an end-to-end manner, and shows that the same model supports scene modification, using the respective scene graph as interface.

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

This article attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D dynamic scene graph (DSG), that seamlessly captures metric and semantic aspects of a dynamic environment.

3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans

This is the first paper that reconciles visual-inertial SLAM and dense human mesh tracking and can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction.

3DP3: 3D Scene Perception via Probabilistic Programming

3DP3 enables scene understanding that is aware of 3D shape, occlusion, and contact structure and is more accurate at 6DoF object pose estimation from real images than deep learning baselines.

A Survey of Scene Graph: Generation and Application

This paper provides a systematic review of the existing techniques of scene graph generation and application, including not only the state-of-the arts but also those with latest trends, and discusses thescene graph generation methods according to the inference models for visual relationship detection, and the applications ofscene graph are statedaccording to the specific visual tasks.

3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs

The Variable Scene Graph (VSG) is proposed, which augments existing 3D Scene Graph representations with the variability attribute, representing the likelihood of discrete long-term change events, and a novel method is presented, DeltaVSG, to estimate the variability of VSGs in a supervised fashion.

Hydra: A Real-time Spatial Perception Engine for 3D Scene Graph Construction and Optimization

This paper describes the first real-time Spatial Perception engINe (SPIN), a suite of algorithms to build a 3D scene graph from sensor data in real- time that is implemented into a highly parallelized architecture that combines fast early and mid level perception processes with slower highlevel perception processes.



3D Semantic Parsing of Large-Scale Indoor Spaces

This paper argues that identification of structural elements in indoor spaces is essentially a detection problem, rather than segmentation which is commonly used, and proposes a method for semantic parsing the 3D point cloud of an entire building using a hierarchical approach.

Building a database of 3D scenes from user annotations

A model is described that integrates cues extracted from the object labels to infer the implicit geometric information and it is shown how it can find better scene matches for an unlabeled image by expanding the database through viewpoint interpolation to unseen views.

A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation

This paper aims to build a robust annotation tool that effectively and conveniently enables the segmentation and annotation of massive 3D data, and works by coupling 2D and 3D information via an interactive framework, through which users can provide high-level semantic annotation for objects.

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

A Holistic Scene Grammar (HSG) is introduced to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes, and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.

Scene Parsing by Integrating Function, Geometry and Appearance Models

The proposed approach not only significantly widens the scope of indoor scene parsing algorithm from the segmentation and the 3D recovery to the functional object recognition, but also yields improved overall performance.

Scene Graph Generation from Objects, Phrases and Region Captions

This work proposes a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner and shows the joint learning across three tasks with the proposed method can bring mutual improvements over previous models.

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

A dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations, enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large- scale indoor spaces.

SEGCloud: Semantic Segmentation of 3D Point Clouds

SEGCloud is presented, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF).

Understanding Indoor Scenes Using 3D Geometric Phrases

A hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reasonable amount of training data, and avoids oversimplification is presented.

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

The value of the synthesized dataset is demonstrated, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.