Weakly supervised learning of multi-object 3D scene decompositions using deep shape priors

  title={Weakly supervised learning of multi-object 3D scene decompositions using deep shape priors},
  author={Cathrin Elich and Martin R. Oswald and Marc Pollefeys and Joerg Stueckler},
  journal={Computer Vision and Image Understanding},
Learning Multi-Object Dynamics with Compositional Neural Radiance Fields
A key feature of this approach is that the learned 3D information of the scene through the NeRF model enables us to incorporate structural priors in learning the dynamics models, making long-term predictions more stable.


Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation
We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to a different object. A single
Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading
A unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples, that can learn to generate and reconstruct concave object classes and supports concave classes such as bathtubs and sofas, which methods based on silhouettes cannot learn.
Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation
This work proposes a single shot image-to-semantic voxel model translation framework that achieves and surpasses state-of-the-art in the reconstruction of scenes with multiple non-rigid objects of different classes.
Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision
An encoder-decoder network with a novel projection loss defined by the projective transformation enables the unsupervised learning using 2D observation without explicit 3D supervision and shows superior performance and better generalization ability for 3D object reconstruction when the projection loss is involved.
Multi-Object Representation Learning with Iterative Variational Inference
This work argues for the importance of learning to segment and represent objects jointly, and demonstrates that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations.
Visual Object Networks: Image Generation with Disentangled 3D Representations
A new generative model, Visual Object Networks (VONs), synthesizing natural images of objects with a disentangled 3D representation that enables many 3D operations such as changing the viewpoint of a generated image, shape and texture editing, linear interpolation in texture and shape space, and transferring appearance across different objects and viewpoints.
DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation
This paper proposes a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from stereo images of road scenes and demonstrates that the method significantly improves accuracy for several recent detection approaches.
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
This work defines the new task of 3D controllable image synthesis and proposes an approach for solving it by reasoning both in 3D space and in the 2D image domain, and demonstrates that the model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images.
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
A generative latent variable model, called SPACE, is proposed that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches and resolves the scalability problems of previous methods.