ANR: Articulated Neural Rendering for Virtual Avatars

  title={ANR: Articulated Neural Rendering for Virtual Avatars},
  author={Amit Raj and Julian Tanke and James Hays and Minh Vo and Carsten Stoll and Christoph Lassner},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Amit Raj, Julian Tanke, C. Lassner
  • Published 23 December 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
The combination of traditional rendering with neural networks in Deferred Neural Rendering (DNR) [38] provides a compelling balance between computational complexity and realism of the resulting images. Using skinned meshes for rendering articulating objects is a natural extension for the DNR framework and would open it up to a plethora of applications. However, in this case the neural shading step must account for deformations that are possibly not captured in the mesh, as well as alignment in… 

Figures and Tables from this paper

DRaCoN - Differentiable Rasterization Conditioned Neural Radiance Fields for Articulated Avatars
DRaCoN is presented, a framework for learning full-body volumetric avatars which exploits the advantages of both the 2D and 3D neural rendering techniques.
Animatable Neural Implicit Surfaces for Creating Avatars from Videos
This paper proposes Animatable Neural Implicit Surface (AniSDF), which models the human geometry with a signed distance field and defers the appearance generation to the 2D image space with a 2D neural renderer, enabling the high-quality reconstruction of human bodies.
HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars
A novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR), which synthesizes virtual human avatars from arbitrary poses efficiently and at high quality and enables HVTR to handle complicated motions, render high quality avatars under usercontrolled poses/shapes and even loose clothing, and most importantly, be fast at inference time.
High-Fidelity Human Avatars from a Single RGB Camera
A coarse-to-fine framework to reconstruct a personalized high-fidelity human avatar from a monocular video and designs a dynamic surface network to recover pose-dependent surface deformations, which help to decouple the shape and texture of the person.
Structured Local Radiance Fields for Human Avatar Modeling
A novel representation on the basis of recent neural scene rendering techniques that enables automatic construction of animatable human avatars for various types of clothes without the need for scanning subject-specific templates, and can generate realistic images with dynamic details for novel poses.
Modeling clothing as a separate layer for an animatable human avatar
This work proposes a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos, and shows the benefit of an explicit clothing model that allows the clothing texture to be edited in the animation output.
Neural Proxy
This work proposes neural proxy, a novel neural rendering model that utilizes animatable proxies for representing photo-realistic targets and is able to render unseen animations without any temporal learning.
Neural actor
Experiments demonstrate that the proposed Neural Actor achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.
TAVA: Template-free Animatable Volumetric Actors
This paper proposes TAVA, a method to create Template-free Animatable V olumetric Actors, based on neural representations that relies solely on multi-view data and a tracked skeleton to create a volumetric model of an actor, which can be animated at the test time given novel pose.
The Power of Points for Modeling Humans in Clothing
A neural network is trained from 3D point clouds of many types of clothing, on many bodies, in many poses, and learns to model pose-dependent clothing deformations that can be optimized to fit a previously unseen scan of a person in clothing, enabling the scan to be reposed realistically.


Neural Rendering and Reenactment of Human Actor Videos
The proposed method for generating video-realistic animations of real humans under user control relies on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person to generate a synthetically rendered version of the video.
ARCH: Animatable Reconstruction of Clothed Humans
This paper proposes ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image and shows numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
Textured Neural Avatars
A system for learning full body neural avatars, i.e. deep networks that produce full body renderings of a person for varying body pose and varying camera pose, that is capable of learning to generate realistic renderings while being trained on videos annotated with 3D poses and foreground masks is presented.
Deferred Neural Rendering: Image Synthesis using Neural Textures
This work proposes Neural Textures, which are learned feature maps that are trained as part of the scene capture process that can be utilized to coherently re-render or manipulate existing video content in both static and dynamic environments at real-time rates.
LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering
The novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real- time is taken.
Deep appearance models for face rendering
A data-driven rendering pipeline that learns a joint representation of facial geometry and appearance from a multiview capture setup and a novel unsupervised technique for mapping images to facial states results in a system that is naturally suited to real-time interactive settings such as Virtual Reality (VR).
GeLaTO: Generative Latent Textured Objects
Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance.
Volumetric Capture of Humans With a Single RGBD Camera via Semi-Parametric Learning
This work proposes an end-to-end framework that fuses both data sources to generate novel renderings of the performer, and shows that the framework is able to achieve compelling results, with substantially less infrastructure than previously required.
Video Based Reconstruction of 3D People Models
This paper describes a method to obtain accurate 3D body models and texture of arbitrary people from a single, monocular video in which a person is moving and presents a robust processing pipeline to infer 3D model shapes including clothed people with 4.5mm reconstruction accuracy.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
This work describes how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrates results that outperform prior work on neural rendering and view synthesis.