DeepVoxels: Learning Persistent 3D Feature Embeddings

@article{Sitzmann2019DeepVoxelsLP,
  title={DeepVoxels: Learning Persistent 3D Feature Embeddings},
  author={Vincent Sitzmann and Justus Thies and Felix Heide and Matthias Nie{\ss}ner and Gordon Wetzstein and Michael Zollh{\"o}fer},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={2432-2441}
}
In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. [] Key Method Our approach combines insights from 3D geometric computer vision with recent advances in learning image-to-image mappings based on adversarial loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction of the scene, using a 2D re-rendering loss and enforces perspective and multi-view geometry in a principled manner. We…

Figures and Tables from this paper

DeepVoxels++: Enhancing the Fidelity of Novel View Synthesis from 3D Voxel Embeddings
We present a novel view synthesis method based upon latent voxel embeddings of an object, which encode both shape and appearance information and are learned without explicit 3D occupancy supervision.
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
TLDR
HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
TLDR
HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.
NVS Machines: Learning Novel View Synthesis with Fine-grained View Control
TLDR
An approach that learns to synthesize high-quality, novel views of 3D objects or scenes, while providing fine-grained and precise control over the 6-DOF viewpoint is presented, which generalizes to entirely unseen images such as product images downloaded from the internet.
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
TLDR
This work defines the new task of 3D controllable image synthesis and proposes an approach for solving it by reasoning both in 3D space and in the 2D image domain, and demonstrates that the model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images.
Embodied View-Contrastive 3D Feature Learning
TLDR
This work underlines the importance of 3D representations and egomotion stabilization for visual recognition, and proposes a viable computational model for learning 3D visual feature representations and 3D object bounding boxes supervised by moving and watching objects move.
Fast and Explicit Neural View Synthesis
TLDR
It is shown that with the simple formulation, the model is able to generalize novel view synthesis to object categories not seen during training and can use view synthesis as a self-supervision signal for efficient learning of 3D geometry without explicit 3D supervision.
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
TLDR
GANcraft is presented, an unsupervised neural rendering framework for generating photorealistic images of large 3D block worlds such as those created in Minecraft, and allows user control over both scene semantics and output style.
SynSin: End-to-End View Synthesis From a Single Image
TLDR
This work proposes a novel differentiable point cloud renderer that is used to transform a latent 3D point cloud of features into the target view and outperforms baselines and prior work on the Matterport, Replica, and RealEstate10K datasets.
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
TLDR
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 69 REFERENCES
Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis
TLDR
A novel recurrent convolutional encoder-decoder network that is trained end-to-end on the task of rendering rotated objects starting from a single image and allows the model to capture long-term dependencies along a sequence of transformations.
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
TLDR
An encoder-decoder network with a novel projection loss defined by the projective transformation enables the unsupervised learning using 2D observation without explicit 3D supervision and shows superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.
Layer-structured 3D Scene Inference via View Synthesis
We present an approach to infer a layer-structured 3D representation of a scene from a single input image. This allows us to infer not only the depth of the visible pixels, but also to capture the
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
TLDR
The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Learning a Multi-View Stereo Machine
TLDR
End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images than required by classical approaches as well as completion of unseen surfaces.
Transformation-Grounded Image Generation Network for Novel 3D View Synthesis
We present a transformation-grounded image generation network for novel 3D view synthesis from a single image. Our approach first explicitly infers the parts of the geometry visible both in the input
Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency
TLDR
A differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view is proposed by reformulating view consistency using a differentiable ray consistency (DRC) term and it is shown that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations.
Learning Free-Form Deformations for 3D Object Reconstruction
TLDR
This paper proposes a method to learn free-form deformations (FFD) for the task of 3D reconstruction from a single image and achieves state-of-the-art results on point-cloud and volumetric metrics.
Unsupervised Learning of 3D Structure from Images
TLDR
This paper learns strong deep generative models of 3D structures, and recovers these structures from 3D and 2D images via probabilistic inference, demonstrating for the first time the feasibility of learning to infer 3D representations of the world in a purely unsupervised manner.
Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction
TLDR
This paper uses 2D convolutional operations to predict the 3D structure from multiple viewpoints and jointly apply geometric reasoning with 2D projection optimization, and introduces the pseudo-renderer, a differentiable module to approximate the true rendering operation, to synthesize novel depth maps for optimization.
...
1
2
3
4
5
...