Corpus ID: 232478805

# NeRF-VAE: A Geometry Aware 3D Scene Generative Model

@article{Kosiorek2021NeRFVAEAG,
title={NeRF-VAE: A Geometry Aware 3D Scene Generative Model},
author={Adam R. Kosiorek and Heiko Strathmann and Daniel Zoran and Pol Moreno and Ros{\'a}lia G. Schneider and Sovna Mokr'a and Danilo Jimenez Rezende},
journal={ArXiv},
year={2021},
volume={abs/2104.00587}
}
We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering. In contrast to NeRF, our model takes into account shared structure across scenes, and is able to infer the structure of a novel scene -- without the need to re-train -- using amortized inference. NeRF-VAE's explicit 3D rendering process further contrasts previous generative models with convolution-based rendering which lacks geometric structure. Our model is a VAE… Expand
7 Citations

#### Figures and Tables from this paper

Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation
• Computer Science, Mathematics
• ArXiv
• 2021
We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to a different object. A singleExpand
Unsupervised Discovery of Object Radiance Fields
• Hong-Xing Yu
• Computer Science
• ArXiv
• 2021
UORF, trained on multi-view RGB images without annotations, learns to decompose complex scenes with diverse, textured background from a single image and performs well on unsupervised 3D scene segmentation, novel view synthesis, and scene editing on three datasets. Expand
SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
This work presents an unsupervised variational approach to the compositional structure of any given scene, and learns to infer two sets of latent representations from RGB video input, which allows the model, SIMONe, to represent object attributes in an allocentric manner which does not depend on viewpoint. Expand
NeRF in detail: Learning to sample for view synthesis
• Computer Science
• ArXiv
• 2021
Neural radiance fields methods have demonstrated impressive novel view synthesis performance by querying a neural network at points sampled along the ray to obtain the density and colour of the sampled points, and integrating this information using the rendering equation. Expand
An Information-Theoretic Perspective on Proper Quaternion Variational Autoencoders
• Medicine, Computer Science
• Entropy
• 2021
This paper analyze the QVAE under an information-theoretic perspective, studying the ability of the H-proper model to approximate improper distributions as well as the built-in H- Proper ones and the loss of entropy due to the improperness of the input signal. Expand
Learning to Stylize Novel Views
Experimental results on two diverse datasets of real-world scenes validate that the proposed point cloud-based method generates consistent stylized novel view synthesis results against other alternative approaches. Expand
Stochastic Neural Radiance Fields: Quantifying Uncertainty in Implicit 3D Representations
• Computer Science
• ArXiv
• 2021
Stochastic Neural Radiance Fields is proposed, a generalization of standard NeRF that learns a probability distribution over all the possible radiance fields modeling the scene that allows to quantify the uncertainty associated with the scene information provided by the model. Expand

#### References

SHOWING 1-10 OF 58 REFERENCES
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
• Computer Science
• NeurIPS
• 2019
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model. Expand
D-NeRF: Neural Radiance Fields for Dynamic Scenes
• Computer Science
• CVPR
• 2021
D-NeRF is introduced, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions from a \emph{single} camera moving around the scene. Expand
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
• Computer Science
• 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
• 2019
HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models. Expand
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
• Computer Science
• 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
• 2019
HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models. Expand
D-NeRF can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which are dubbed "nerfies". Expand
pixelNeRF: Neural Radiance Fields from One or Few Images
• Computer Science
• CVPR
• 2021
We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fieldsExpand
Neural Radiance Flow for 4D View Synthesis and Video Processing
• Computer Science
• ArXiv
• 2020
This work uses a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene, and demonstrates that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision. Expand
Occupancy Networks: Learning 3D Reconstruction in Function Space
• Computer Science
• 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
• 2019
This paper proposes Occupancy Networks, a new representation for learning-based 3D reconstruction methods that encodes a description of the 3D output at infinite resolution without excessive memory footprint, and validate that the representation can efficiently encode 3D structure and can be inferred from various kinds of input. Expand
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
• Computer Science, Mathematics
• ICLR
• 2020
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, mostExpand
Single-view to Multi-view: Reconstructing Unseen Views with a Convolutional Network
• Computer Science
• ArXiv
• 2015
A convolutional network capable of generating images of a previously unseen object from arbitrary viewpoints given a single image of this object and an implicit 3D representation of the object class is presented. Expand