Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild

@article{Wu2020UnsupervisedLO,
  title={Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild},
  author={Shangzhe Wu and C. Rupprecht and Andrea Vedaldi},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={1-10}
}
We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the… 
Toward Realistic Single-View 3D Object Reconstruction with Unsupervised Learning from Multiple Images
TLDR
This paper eliminates the symmetry requirement with a novel unsupervised algorithm that can learn a 3D reconstruction network from a multi-image dataset and employs a novel albedo loss that improves the reconstructed details and realisticity.
Unsupervised Learning of 3D Object Categories from Videos in the Wild
TLDR
A new neural network design is proposed, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction while obtaining a detailed implicit representation of the object surface and texture, also compensating for the noise in the initial SfM reconstruction that bootstrapped the learning process.
Shape and Viewpoint without Keypoints
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera
DOVE: Learning Deformable 3D Objects by Watching Videos
TLDR
DOVE is presented, which learns to predict 3D canonical shape, deformation, viewpoint and texture from a single 2D image of a bird, given a bird video collection as well as automatically obtained silhouettes and optical flows as training data.
Weak Multi-View Supervision for Surface Mapping Estimation
TLDR
A weakly-supervised multi-view learning approach to learn category-specific surface mapping without dense annotations that can generate accurate variations away from the mean shape, is multi-View consistent, and performs comparably to fully supervised approaches is proposed.
Shelf-Supervised Mesh Prediction in the Wild
TLDR
This work proposes a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e. ‘shelf-supervised’).
Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction
TLDR
This work presents a geometry-based end-to-end deep learning framework that first detects the mirror plane of reflection symmetry that commonly exists in man-made objects and then predicts depth maps by finding the intra-image pixel-wise correspondence of the symmetry.
De-rendering 3D Objects in the Wild
TLDR
A weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters is presented.
Self-Supervised 3D Mesh Reconstruction from Single Images
TLDR
This paper proposes a Self-supervised Mesh Reconstruction (SMR) approach to enhance 3D mesh attribute learning process, motivated by observations that 3D attributes from interpolation and prediction should be consistent, and feature representation of landmarks from all imagesShould be consistent.
Model-based 3D Hand Reconstruction via Self-Supervised Learning
TLDR
This work proposes S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint and utilizes the consistency between 2D and 3D representations and a set of novel losses to rationalize outputs of the neural network.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 97 REFERENCES
Modelling and unsupervised learning of symmetric deformable object categories
TLDR
It is shown that, if only raw images are given, it is possible to look instead for symmetries in the space of object deformations and provide an explanation of the ambiguities that arise in recovering the pose of symmetric objects from their shape or images and a way of discounting such ambigUities in learning.
Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading
TLDR
A unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples, that can learn to generate and reconstruct concave object classes and supports concave classes such as bathtubs and sofas, which methods based on silhouettes cannot learn.
Unsupervised Generative 3D Shape Learning from Natural Images
TLDR
This paper presents the first method to learn a generative model of 3D shapes from natural images in a fully unsupervised way, and demonstrates that this method can learn realistic 3D shape of faces by using only the natural images of the FFHQ dataset.
Learning Category-Specific Mesh Reconstruction from Image Collections
TLDR
A learning framework for recovering the 3D shape, camera, and texture of an object from a single image by incorporating texture inference as prediction of an image in a canonical appearance space and shows that semantic keypoints can be easily associated with the predicted shapes.
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images
TLDR
HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner and is shown to be able to generate images with similar or higher visual quality than other generative models.
Unsupervised learning of object frames by dense equivariant image labelling
TLDR
A new approach is proposed that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame that is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates.
3D Shape Induction from 2D Views of Multiple Objects
TLDR
The approach called "projective generative adversarial networks" (PrGANs) trains a deep generative model of 3D shapes whose projections match the distributions of the input 2D views, which allows it to predict 3D, viewpoint, and generate novel views from an input image in a completely unsupervised manner.
C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion
TLDR
This work proposes C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images by learning a deep network that reconstructs a 3D object from a single view at a time, and introduces a novel regularization technique.
Exploiting Symmetry and/or Manhattan Properties for 3D Object Structure Estimation from Single and Multiple Images
  • Yuan Gao, A. Yuille
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
This paper proposes a novel rigid structure from motion method, exploiting symmetry and using multiple images from the same category as input, which significantly outperforms baseline methods in the multiple-image case.
Lifting AutoEncoders: Unsupervised Learning of a Fully-Disentangled 3D Morphable Model Using Deep Non-Rigid Structure From Motion
TLDR
This work introduces Lifting Autoencoders, a generative 3D surface-based model of object categories that can be controlled in terms of interpretable geometry and appearance factors, allowing it to perform photorealistic image manipulation of identity, expression, 3D pose, and illumination properties.
...
1
2
3
4
5
...