BlobGAN: Spatially Disentangled Scene Representations

@article{Epstein2022BlobGANSD,
  title={BlobGAN: Spatially Disentangled Scene Representations},
  author={Dave Epstein and Taesung Park and Richard Zhang and Eli Shechtman and Alexei A. Efros},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.02837}
}
. We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered “blobs” of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial uniformity of blobs and the locality inherent to convolution, our network learns to associate different… 
Enriching StyleGAN with Illumination Physics
TLDR
This paper shows how to use simple physical properties of images to enrich StyleGAN’s generation capacity, and suggests the proposed method, StyLitGAN, can add and remove luminaires in the scene and generate images with realistic lighting effects, requiring no labeled paired relighting data or any other geometric supervision.
Rough bibliographical notes on intrinsic images, equivariance and relighting
  • Computer Science
  • 2022
TLDR
The WHDR evaluation framework was put in place by [12], who constructed a dataset consisting of human judgements which compare the absolute lightness at pairs of points in real images, and is known as the weighted human disagreement ratio (WHDR).

References

SHOWING 1-10 OF 101 REFERENCES
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
TLDR
The key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis and a fast and realistic image synthesis model is proposed.
3D-Aware Scene Manipulation via Inverse Graphics
TLDR
3D scene de-rendering networks (3D-SDN) is proposed to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model.
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
TLDR
HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.
Image Generation from Scene Graphs
TLDR
This work proposes a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships, and validates this approach on Visual Genome and COCO-Stuff.
Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis
TLDR
This work shows that highly-structured semantic hierarchy emerges as variation factors from synthesizing scenes from the generative representations in state-of-the-art GAN models, like StyleGAN and BigGAN, and quantifies the causality between the activations and semantics occurring in the output image.
Unsupervised Discovery of Object Radiance Fields
TLDR
UORF, trained on multi-view RGB images without annotations, learns to decompose complex scenes with diverse, textured background from a single image and performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets.
BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images
TLDR
The experiments show that using explicit 3D features to represent objects allows BlockGAN to learn disentangled representations both in terms of objects (foreground and background) and their properties (pose and identity).
Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers
TLDR
This paper model a scene as a collage of warped, layered objects sampled from labeled, reference images, and exploits this representation for several applications: image editing, random scene synthesis, and image-to-anaglyph.
Recovering Surface Layout from an Image
TLDR
This paper takes the first step towards constructing the surface layout, a labeling of the image intogeometric classes, to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region.
Describing Visual Scenes using Transformed Dirichlet Processes
TLDR
This work develops a hierarchical probabilistic model for the spatial structure of visual scenes based on the transformed Dirichlet process, a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data.
...
...