DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

  title={DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization},
  author={Cheng Zhang and Zhaopeng Cui and Cai Chen and Shuaicheng Liu and Bing Zeng and Hujun Bao and Yinda Zhang},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Panorama images have a much larger field-of-view thus naturally encode enriched scene context information compared to standard perspective images, which however is not well exploited in the previous scene understanding methods. In this paper, we propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image. In order to fully utilize the rich context… 

Figures and Tables from this paper

Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation

This work proposes to learn object deformations and panoramic image distortions in the Dformable Patch Embedding and Deformable MLP components which blend into the authors' Transformer for PAnoramic Semantic Segmentation (Trans4PASS) model, tying together shared semantics in pinhole- andPanoramic feature embeddings by generating multi-scale prototype features and aligning them in their Mutual Prototypical Adaptation (MPA) for unsupervised domain adaptation.

Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation

This paper addresses panoramic semantic segmentation, which provides a full-view and dense-pixel understanding of surroundings in a holistic way and introduces the upgraded Trans4PASS+ model, featuring DMLPv2 with parallel token mixing to improve the flexibility and generalizability in modeling discriminative cues.

Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint Rendering for the Closed Scene Composed of Pre-Captured Objects

The experiments demonstrate that the two-stage design achieves robust 3D scene understanding and outperforms competing methods by a large margin, and it is shown that the realistic free-viewpoint rendering enables various applications, including scene touring and editing.

Joint stereo 3D object detection and implicit surface reconstruction

This approach features a new instance-level network that explicitly models the unseen surface hallucination problem using point-based representations and uses a new geometric representation for orientation refinement.

ESCNet: Gaze Target Detection with the Understanding of 3D Scenes

  • Jun BaoBuyu LiuJun Yu
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
This work proposes to explicitly and effectively model 3D geometry under challenging scenario where only 2D annotations are available and validate the idea on two publicly available dataset, GazeFollow and VideoAttentionTarget, and demon-strate the state-of-the-art performance.

Complementary Bi-directional Feature Compression for Indoor 360° Semantic Segmentation with Self-distillation

This paper combines the two different representations and proposes a novel 360 ° semantic segmentation solution from a complementary perspective, which outperforms the state-of-the-art solutions with at least 10% improvement on quantitative evaluations while displaying the best performance on visual appearance.

Review on Panoramic Imaging and Its Applications in Scene Understanding

This review discusses, in detail, the broad application prospects and great design potential of freeform surfaces, thin-plate optics, and metasurfaces in panoramic imaging, and provides a detailed analysis of how these techniques can help enhance the performance of pan oramic imaging systems.

Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

This paper presents a systematic and comprehensive review and analysis of the recent progress in DL methods for omnidirectional vision, including a structural and hierarchical taxonomy of the DL methods and a summarization of the latest novel learning strategies and applications.

Neural rendering in a room

A novel solution to mimic such human perception capability based on a new paradigm of amodal 3D scene understanding with neural rendering for a closed scene by exploiting compositional neural rendering techniques for data augmentation in the offline training.

Self-supervised 360$^{\circ}$ Room Layout Estimation

  • 2022



PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding

Experiments show that solely based on 3D context without any image region category classifier, the proposed whole-room context model can achieve a comparable performance with the state-of-the-art object detector, demonstrating that when the FOV is large, context is as powerful as object appearance.

DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

This paper presents an approach to embed 3D context into the topology of a neural network trained to perform holistic scene understanding, and generates partially synthetic depth images which are rendered by replacing real objects with a repository of CAD models of the same object category.

Holistic 3D Scene Understanding from a Single Image with Implicit Representation

This work proposes an image-based local structured implicit network to improve the object shape estimation, but also refine the 3D object pose and scene layout via a novel implicit scene graph neural network that exploits the implicit local object features.

LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image

An algorithm to predict room layout from a single image that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g. "L"-shape room) is proposed, which achieves among the best accuracy for perspective images and can handle both cuboid-shaped and moregeneral Manhattan layouts.

HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation

The proposed network, HorizonNet, trained for predicting 1D layout, outperforms previous state-of-the-art approaches and can diversify panorama data and be applied to other panorama-related learning tasks.

SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans

A message-passing graph neural network is proposed to model the inter-relationships between objects and layout, guiding generation of a globally object alignment in a scene by considering the global scene layout.

DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama

A deep learning framework, called DuLa-Net, to predict Manhattan-world 3D room layouts from a single RGB panorama that leverages two projections of the panorama at once, namely the equirectangular panorama-view and the perspective ceiling-view, that each contains different clues about the room layouts.

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image

This paper proposes an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image, and argues that understanding the context of each component can assist the task of parsing the others, which enables joint understanding and reconstruction.

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

This paper presents a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks, and takes advantage of the availability of professional interior designs to automatically extract 3D structures from them.

Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image

A Holistic Scene Grammar (HSG) is introduced to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes, and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.