3D-Aware Indoor Scene Synthesis with Depth Priors

  title={3D-Aware Indoor Scene Synthesis with Depth Priors},
  author={Zifan Shi and Yujun Shen and Jiapeng Zhu and Dit-Yan Yeung and Qifeng Chen},
  booktitle={European Conference on Computer Vision},
. Despite the recent advancement of Generative Adversarial Networks (GANs) in learning 3D-aware image synthesis from 2D data, existing methods fail to model indoor scenes due to the large diversity of room layouts and the objects inside. We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry. In this work, we fill in this gap by introducing depth as a 3D prior. 1 Compared with other 3D data… 

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

The proposed model spatially disentangles the whole scene into object-centric generative radiance by learning on only 2D images with the global-local discrimination, and demonstrates state-of-the-art performance on many scene datasets, including the challenging Waymo outdoor dataset.

Deep Generative Models on 3D Representations: A Survey

A thorough review of the development of 3D generation, including 3D shape generation and 3D-aware image synthesis, from the perspectives of both algorithms and more importantly representations is made.

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to an image region or a semantic category in the synthesis in order to facilitate a more convenient local control of GAN generation.

SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene

This work introduces SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene that outperform the closest related works in both quality and diversity by a large margin.

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

A novel guidance approach for diffusion models that uses estimated depth information de-rived from the rich intermediate representations of diffusion models to guide a generated image to be aware of its geometric configuration.

GLeaD: Improving GANs with A Generator-Leading Task

This work proposes a new paradigm for adversarial training, which makes G assign a task to D as well, and believes that the pioneering attempt present in this work could inspire the community with better designed generator-leading tasks for GAN improvement.

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

3D geometric information is introduced into a human-like spatial reasoning process to capture the contextual knowledge of key objects step-by-step and achieves state-of-the-art performance on TextVQA and ST-VQ a datasets.

A Survey on 3D-aware Image Synthesis

This survey aims to introduce new researchers to the task of 3D-aware image synthesis, provide a useful reference for related works, and stimulate future research directions through the discussion section.

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noises. Our framework is learned from



GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis

This paper proposes a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene, and introduces a multi-scale patch-based discriminator to demonstrate synthesis of high-resolution images while training the model from unposed 2D images alone.

Efficient Geometry-aware 3D Generative Adversarial Networks

This work introduces an expressive hybrid explicit implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry by decoupling feature generation and neural rendering.

3D-aware Image Synthesis via Learning Structural and Textural Representations

This work proposes a novel framework, termed as VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation in a Generative Adversarial Network.

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

This work learns a 3D- and camera-aware generative model which faithfully recovers not only the image but also the camera data distribution, and proposes to decompose the scene into a background and foreground model, leading to more efficient and disentangled scene representations.

Visual Object Networks: Image Generation with Disentangled 3D Representations

A new generative model, Visual Object Networks (VONs), synthesizing natural images of objects with a disentangled 3D representation that enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches how-ever fall short in two ways: first,

Learning to Recover 3D Scene Shape from a Single Image

A two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then uses 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape is proposed.

GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation

This work proposes a novel approach that regulates point sampling and radiance field learning on 2D manifolds, embodied as a set of learned implicit surfaces in the 3D volume that can produce high quality images with realistic fine details and strong visual 3D consistency.

HoloGAN: Unsupervised Learning of 3D Representations From Natural Images

HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.

Unconstrained Scene Generation with Locally Conditioned Radiance Fields

Generative Scene Networks is introduced, which learns to decompose scenes into a collection of many local radiance fields that can be rendered from a free moving camera, and which produces quantitatively higher-quality scene renderings across several different scene datasets.