DANBO: Disentangled Articulated Neural Body Representations via Graph Neural Networks

  title={DANBO: Disentangled Articulated Neural Body Representations via Graph Neural Networks},
  author={Shih-Yang Su and Timur M. Bagautdinov and Helge Rhodin},
Deep learning greatly improved the realism of animatable human models by learning geometry and appearance from collections of 3D scans, template meshes, and multi-view imagery. High-resolution models enable photo-realistic avatars but at the cost of requiring studio settings not available to end users. Our goal is to create avatars directly from raw images without relying on expensive studio se-tups and surface tracking. While a few such approaches exist, those have limited generalization… 

Figures and Tables from this paper


Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans
Neural Body is proposed, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh, so that the observations across frames can be naturally integrated.
imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
ImGHUM is presented, the first holistic generative model of 3D human shape and articulated pose, represented as a signed distance function, and has attached spatial semantics making it straightforward to establish correspondences between different shape instances, thus enabling applications that are difficult to tackle using classical implicit representations.
HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields
By modeling a family of shapes in a high dimensional space, the Hyper-NeRF model is able to handle topological variation and thereby produce more realistic renderings and more accurate geometric reconstructions, as can be seen in (b).
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF’s ability to represent fine details, while also being 7% faster than NeRF and half the size.
Mixture of volumetric primitives for efficient neural rendering
Mixture of Volumetric Primitives (MVP), a representation for rendering dynamic 3D content that combines the completeness of volumetric representations with the efficiency of primitive-based rendering, is presented.
Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction
This work combines a scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions and shows that this learned volumetric representation allows for photorealistic image generation that surpasses the quality of state-of-the-art video-based reenactment methods.
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
A learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs, and applies it to internet photo collections of famous landmarks, to demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.
Convolutional Occupancy Networks
Convolutional Occupancy Networks is proposed, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes that enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
VIBE: Video Inference for Human Body Pose and Shape Estimation
This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.
NASA: Neural Articulated Shape Approximation
This paper introduces neural articulated shape approximation (NASA), an alternative framework that enables efficient representation of articulated deformable objects using neural indicator functions that are conditioned on pose.