HoloGAN: Unsupervised Learning of 3D Representations From Natural Images

  title={HoloGAN: Unsupervised Learning of 3D Representations From Natural Images},
  author={Thu Nguyen-Phuoc and Chuan Li and Lucas Theis and Christian Richardt and Yong-Liang Yang},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. [] Key Method Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with…

Figures and Tables from this paper

Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
This work defines the new task of 3D controllable image synthesis and proposes an approach for solving it by reasoning both in 3D space and in the 2D image domain, and demonstrates that the model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images.
GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis
This paper proposes a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene, and introduces a multi-scale patch-based discriminator to demonstrate synthesis of high-resolution images while training the model from unposed 2D images alone.
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation
Pix2Shape learns a consistent scene representation in its encoded latent space, and that the decoder can then be applied to this latent representation in order to synthesize the scene from a novel viewpoint.
Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images
This paper designs a 3D GAN which can learn a disentangled model of objects, just from monocular observations, and designs an approach to embed real images into the latent space of the model, enabling editing of real images.
Photo-Geometric Autoencoding to Learn 3D Objects from Unlabelled Images
This work uses generative models to infer the 3D shape of object categories from raw single-view images, using no external supervision, and demonstrates superior accuracy compared to other methods that use supervision at the level of 2D image correspondences.
Generative Neural Articulated Radiance Fields
This work develops a 3D GAN framework that learns to generate radiance of human bodies or faces in a canonical pose and warp them using an explicit deformation into a desired body pose or facial expression and demonstrates the first high-quality radiance generation results for human bodies.
Unsupervised Learning of Depth and Depth-of-Field Effect from Natural Images with Aperture Rendering Generative Adversarial Networks
  • Takuhiro Kaneko
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
DoF mixture learning is developed, which enables the generator to learn real image distribution while generating diverse DoF images and devise a center focus prior to guiding the learning direction to address the ambiguities triggered by unsupervised setting.
2D GANs Meet Unsupervised Single-view 3D Reconstruction
A novel image-conditioned neural implicit field is proposed, which can leverage 2D supervisions from GAN-generated multiview images and perform the single-view reconstruction of generic objects.
Self-Supervised 2D Image to 3D Shape Translation with Disentangled Representations
SIST is proposed, a Self-supervised Image to Shape Translation framework that fulfills three tasks: reconstructing the 3D shape from a single image; learning disentangled representations for shape, appearance and viewpoint; and generating a realistic RGB image from these independent factors.
GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds
GANcraft is presented, an unsupervised neural rendering framework for generating photorealistic images of large 3D block worlds such as those created in Minecraft, and allows user control over both scene semantics and output style.


Visual Object Networks: Image Generation with Disentangled 3D Representations
A new generative model, Visual Object Networks (VONs), synthesizing natural images of objects with a disentangled 3D representation that enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.
DeepVoxels: Learning Persistent 3D Feature Embeddings
This work proposes DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry, based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying3D scene structure.
Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision
A unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples, which is comparable or superior to state-of-the-art voxel-based approaches on quantitative metrics, while producing results that are visually more pleasing.
GAGAN: Geometry-Aware Generative Adversarial Networks
Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality than current GAN-based methods.
Transformation-Grounded Image Generation Network for Novel 3D View Synthesis
We present a transformation-grounded image generation network for novel 3D view synthesis from a single image. Our approach first explicitly infers the parts of the geometry visible both in the input
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
An encoder-decoder network with a novel projection loss defined by the projective transformation enables the unsupervised learning using 2D observation without explicit 3D supervision and shows superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.
Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence
This paper proposes an end-to-end trainable framework that learns to exploit multiple viewpoints to synthesize a novel view without any 3D supervision, and introduces a self-learned confidence aggregation mechanism.
Learning Category-Specific Mesh Reconstruction from Image Collections
A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes.
Generating 3D faces using Convolutional Mesh Autoencoders
This work introduces a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface and shows that, replacing the expression space of an existing state-of-the-art face model with this model, achieves a lower reconstruction error.
Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation
This paper proposes a geometry-aware body representation from multi-view images without annotations that significantly outperforms fully-supervised methods given the same amount of labeled data, and improves over other semi-super supervised methods while using as little as 1% of the labeled data.