ICON: Implicit Clothed humans Obtained from Normals

@article{Xiu2021ICONIC,
  title={ICON: Implicit Clothed humans Obtained from Normals},
  author={Yuliang Xiu and Jinlong Yang and Dimitrios Tzionas and Michael J. Black},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={13286-13296}
}
Current methods for learning realistic and animatable 3D clothed avatars need either posed 3D scans or 2D images with carefully controlled user poses. In contrast, our goal is to learn an avatar from only 2D images of people in unconstrained poses. Given a set of images, our method estimates a detailed 3D surface from each image and then combines these into an animatable avatar. Implicit functions are well suited to the first task, as they can capture details like hair and clothes. Current… 

ECON: Explicit Clothed humans Obtained from Normals

The combination of deep learning, artist-curated scans, and Implicit Functions (IF), is enabling the creation of detailed, clothed, 3D humans from images. However, existing methods are far from

One-shot Implicit Animatable Avatars with Model-based Priors

Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available.

gDNA: Towards Generative Detailed Neural Avatars

A novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skin-ning weights is proposed that can be used on the task of fitting human models to raw scans, out-performing the previous state-of-the-art.

An efficient approach for sequential human performance capture from monocular video

This work proposes a learning-based approach for optimizing fine geometry information from monocular RGB cameras using separate neural networks and shares the benefit of classical optimization methods under challenging poses and novel views.

Capturing and Animation of Body and Clothing from Monocular Video

The proposed SCARF (Segmented Clothed Avatar Radiance Field), a hybrid model combining a mesh-based body with a neural radiance field, reconstructs clothing with higher visual quality than existing methods, and can be successfully transferred between avatars of different subjects.

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

This work presents a complete 3D transformer-based attention framework which, given a single image of a person in an un-constrained pose, generates an animatable 3D reconstruction with albedo and illumination decomposition, as a result of a single end-to-end model, trained semi-supervised, and with no additional postprocessing.

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

Vid2Avatar is presented, a method to learn human avatars from monocular in-the-wild videos that solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly, parameterized via two separate neural fields.

Invertible Neural Skinning

The strong performance of the Invertible Neural Skinning method is demonstrated by outperforming the state-of-the-art reposing techniques on clothed humans and preserving surface correspondences, while being an order of magnitude faster.

3D Clothed Human Reconstruction in the Wild

ClothWild is proposed, a 3D clothed human reconstruction framework that firstly addresses the robustness on in-the-wild images, and designs a DensePose-based loss function to reduce ambiguities of the weak supervision.

CHORE: Contact, Human and Object REconstruction from a single RGB image

This work introduces CHORE, a novel method that learns to jointly reconstruct human and object from a single image that significantly outperforms the SOTA and proposes a simple yet effective depth-aware scaling that allows more efficient shape learning on real data.

References

SHOWING 1-10 OF 68 REFERENCES

ARCH: Animatable Reconstruction of Clothed Humans

This paper proposes ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image and shows numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.

gDNA: Towards Generative Detailed Neural Avatars

A novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skin-ning weights is proposed that can be used on the task of fitting human models to raw scans, out-performing the previous state-of-the-art.

SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks

SCANimate is presented, an end-to-end trainable framework that takes raw 3D scans of a clothed human and turns them into an animatable avatar that is driven by pose parameters and has realistic clothing that moves and deforms naturally.

ARCH++: Animation-Ready Clothed Human Reconstruction Revisited

This paper introduces an end-to-end point based geometry encoder to better describe the semantics of the underlying 3D human body, in replacement of previous hand-crafted features, and proposes a co-supervising framework with cross-space consistency to jointly estimate the occupancy in both the posed and canonical spaces.

Learning to Dress 3D People in Generative Clothing

This work learns a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing, and is the first generative model that directly dresses 3D human body meshes and generalizes to different poses.

Learning Cloth Dynamics: 3D+Texture Garment Reconstruction Benchmark

A large-scale dataset of animated garments with variable topology and type, called CLOTH3D++, containing RGBA video sequences paired with its corresponding 3D data, and a competition to develop the best method to perform 3D garment reconstruction in a sequence from 3D-to-3D garments.

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

The proposed Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object, achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.

Collaborative Regression of Expressive Bodies using Moderation

PIXIE is introduced, which produces animatable, whole-body 3D avatars with realistic facial detail, from a single image and is shown to be more accurate whole-shape and detailed face shape than the state of the art.

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements

This work deform surface elements based on a human body model such that large-scale deformations caused by articulation are explicitly separated from topological changes and local clothing deformations, and addresses the limitations of existing neural surface elements by regressing local geometry from local features.

Learning to Reconstruct People in Clothing From a Single RGB Camera

We present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4
...