• Corpus ID: 229363491

Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild

  title={Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild},
  author={Chung-Yi Weng and Brian Curless and Ira Kemelmacher-Shlizerman},
Given an “in-the-wild” video of a person, we reconstruct an animatable model of the person in the video. The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction. At the core of our method is a volumetric 3D human representation reconstructed with a deep network trained on input video, enabling novel pose/view synthesis. Our method is an advance over GAN-based imageto-image translation since it allows image synthesis… 

Figures and Tables from this paper

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

A free-viewpoint rendering method – HumanNeRF – that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube, that optimizes for a volumetric representation of the person in a canonical T-pose in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps.

Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

Experiments demonstrate that the proposed Neural Actor method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.

NeuMan: Neural Human Radiance Field from a Single Video

A novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video is proposed.

Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video

Non-Rigid Neural Radiance Fields (NR-NeRF), a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes, takes RGB images of a dynamic scene as input, and creates a high-quality space-time geometry and appearance representation.

SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes

SNARF is introduced, which combines the advantages of linear blend skinning for polygonal meshes with those of neural implicit surfaces by learning a forward deformation field without direct supervision, allowing for generalization to unseen poses.

Realistic Full-Body Anonymization with Surface-Guided GANs

This work proposes a new anonymization method that generates close-to-photorealistic humans for in-the-wild images and introduces Variational SurfaceAdaptive Modulation (V-SAM) that embeds surface information throughout the generator that significantly improves image quality and diversity of samples.

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

Panoptic Neural Fields is presented, an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff) that can be smaller and faster than previousobject-aware approaches, while still leveraging category-specific priors incorporated via meta-learned initialization.

Audio-driven Neural Gesture Reenactment with Video Motion Graphs

A method that reenacts a high-quality video with gestures matching a target speech audio through a novel video motion graph encoding valid transitions between clips and a pose-aware video blending network is proposed.

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video

SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video, can produce high-fidelity surfaces for arbitrary clothed humans with self-supervised optimization.

Neural actor

Experiments demonstrate that the proposed Neural Actor achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.



Neural Rendering and Reenactment of Human Actor Videos

The proposed method for generating video-realistic animations of real humans under user control relies on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person to generate a synthetically rendered version of the video.

Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation.

A novel human video synthesis method that approaches limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space and shows significant improvement over the state of the art both qualitatively and quantitatively.

ARCH: Animatable Reconstruction of Clothed Humans

This paper proposes ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image and shows numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.

Learning to Reconstruct People in Clothing From a Single RGB Camera

We present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4

Textured Neural Avatars

A system for learning full body neural avatars, i.e. deep networks that produce full body renderings of a person for varying body pose and varying camera pose, that is capable of learning to generate realistic renderings while being trained on videos annotated with 3D poses and foreground masks is presented.

Synthesizing Images of Humans in Unseen Poses

A modular generative neural network is presented that synthesizes unseen poses using training pairs of images and poses taken from human action videos, separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background.

Video Based Reconstruction of 3D People Models

This paper describes a method to obtain accurate 3D body models and texture of arbitrary people from a single, monocular video in which a person is moving and presents a robust processing pipeline to infer 3D model shapes including clothed people with 4.5mm reconstruction accuracy.

Neural Rerendering in the Wild

This work applies traditional 3D reconstruction to register the photos and approximate the scene as a point cloud from Internet photos of a tourist landmark, and trains a deep neural network to learn the mapping of these initial renderings to the actual photos.

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

The proposed Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object, achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.

Everybody Dance Now

This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes