• Corpus ID: 236034070

Unsupervised Discovery of Object Radiance Fields

  title={Unsupervised Discovery of Object Radiance Fields},
  author={Hong-Xing Yu and Leonidas J. Guibas and Jiajun Wu},
We study the problem of inferring an object-centric scene representation from a single image, aiming to derive a representation that explains the image formation process, captures the scene’s 3D nature, and is learned without supervision. Most existing methods on scene decomposition lack one or more of these characteristics, due to the fundamental challenge in integrating the complex 3D-to-2D image formation process into powerful inference schemes like deep networks. In this paper, we propose… 

Figures and Tables from this paper

ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation
A large set of recent unsupervised multi-object segmentation models on CLEVRTEX are benchmarked and find all state-of-the-art approaches fail to learn good representations in the textured setting, despite impressive performance on simpler data.
CoNeRF: Controllable Neural Radiance Fields
This work demonstrates for the first time novel view and novel attribute re-rendering of scenes from a single video by treating the attributes as latent variables that are regressed by the neural network given the scene encoding.
LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks
This is the first work to compress volumetric functions represented by local coordinate-based neural networks, and expects it to be applicable beyond point clouds, for example to compression of high-resolution neural radiance fields.


Semi-Supervised Learning of Multi-Object 3D Scene Representations
This work proposes a novel approach for learning multi-object 3D scene representations from images using a recurrent encoder that learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture from a single view.
Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation
We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to a different object. A single
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
pixelNeRF: Neural Radiance Fields from One or Few Images
We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fields
Object-Centric Neural Scene Rendering
This work proposes to learn object-centric neural scattering functions (OSFs), a representation that models per-object light transport implicitly using a lighting- and view-dependent neural network, and shows that it generalizes to novel illumination conditions, producing photorealistic, physically accurate renderings of multi-object scenes.
Learning to Infer 3D Object Models from Images
A probabilistic generative model for learning to build modular and compositional 3D object models from observations of a multi-object scene and it is demonstrated that the learned representation permits object-wise manipulation and novel scene generation, and generalizes to various settings.
Neural Scene De-rendering
This work proposes a new approach to learn an interpretable distributed representation of scenes, using a deterministic rendering function as the decoder and a object proposal based encoder that is trained by minimizing both the supervised prediction and the unsupervised reconstruction errors.
Unsupervised Layered Image Decomposition into Object Prototypes
This work presents an unsupervised learning framework for decomposing images into layers of automatically discovered object models and is the first layered image decomposition algorithm that learns an explicit and shared concept of object type, and is robust enough to be applied to real images.
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
A generative latent variable model, called SPACE, is proposed that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches and resolves the scalability problems of previous methods.
3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare
A differentiable Render-and-Compare loss is proposed that allows 3D shape and pose to be learned with 2D supervision and produces a compact 3D representation of the scene, which can be readily used for applications like autonomous driving.