Model-based 3D Hand Reconstruction via Self-Supervised Learning

@article{Chen2021Modelbased3H,
  title={Model-based 3D Hand Reconstruction via Self-Supervised Learning},
  author={Yujin Chen and Zhigang Tu and Di Kang and Linchao Bao and Ying Zhang and Xuefei Zhe and Ruizhi Chen and Junsong Yuan},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={10446-10455}
}
  • Yujin Chen, Zhigang Tu, Junsong Yuan
  • Published 22 March 2021
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint… 
Consistent 3D Hand Reconstruction in Video via self-supervised Learning
TLDR
S2HAND is proposed, a self-supervised 3D hand reconstruction model that can jointly estimate pose, shape, texture, and the camera viewpoint from a single RGB input through the supervision of easily accessible 2D detected keypoints.
End-to-end Weakly-supervised Multiple 3D Hand Mesh Reconstruction from Single Image
TLDR
This paper designs a multi-head auto- encoder structure for multi-hand reconstruction, where each head network shares the same feature map and outputs the hand center, pose and texture, respectively, and adopts a weakly-supervised scheme to alleviate the burden of expensive 3D real-world data annotations.
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey
TLDR
This survey presents comprehensive analysis of 3D hand pose estimation from the perspective of efficient annotation and learning, and investigates annotation methods classified as manual, synthetic-model-based, hand-sensor- based, and computational approaches.
Joint Hand-Object 3D Reconstruction From a Single Image With Cross-Branch Feature Fusion
TLDR
This work proposes to consider hand and object jointly in feature space and explore the reciprocity of the two branches in cross-branch feature fusion architectures with MLP or LSTM units, and significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
HandTailor: Towards High-Precision Monocular 3D Hand Recovery
TLDR
This work introduces a novel framework HandTailor, which combines a learning-based hand module and an optimization-based tailor module to achieve high-precision hand mesh recovery from a monocular RGB image.
Multi-view Image-based Hand Geometry Refinement using Differentiable Monte Carlo Ray Tracing
TLDR
An image-based refinement is achieved through differentiable ray tracing, a method that has not been employed so far to relevant problems and is hereby shown to be superior to the approximative alternatives that have been employed in the past.
Semi-Supervised 3D Hand Shape and Pose Estimation with Label Propagation
TLDR
The Pose Alignment network is proposed to propagate 3D annotations from labelled frames to nearby unlabelled frames in sparsely annotated videos to improve the pose estimation accuracy and incorporate the alignment supervision on pairs of labelled-unlabelled frames.
Local and Global Point Cloud Reconstruction for 3D Hand Pose Estimation
TLDR
This paper presents a novel pipeline for local and global point cloud reconstruction using a 3D hand template while learning a latent representation for pose estimation, and introduces a new multi-view hand posture dataset to obtain complete 3D point clouds of the hand in the real world.
InterNet+: A Light Network for Hand Pose Estimation
TLDR
A feature extractor is redesigned that introduced the latest achievements in the field of computer vision, such as the ACON activation function and the new attention mechanism module, etc, which can better extract global features from an RGB image of the hand, leading to a greater performance improvement compared to InterNet and other similar networks.
...
...

References

SHOWING 1-10 OF 55 REFERENCES
3D Hand Shape and Pose Estimation From a Single RGB Image
TLDR
This work proposes a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose and proposes a weakly-supervised approach by leveraging the depth map as a weak supervision in training.
3D Hand Shape and Pose From Images in the Wild
TLDR
This work presents the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild, consisting of the concatenation of a deep convolutional encoder, and a fixed model-based decoder.
Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images
TLDR
A weakly-supervised method, adaptating from fully-annotated synthetic dataset toWeakly-labeled real-world dataset with the aid of a depth regularizer, which generates depth maps from predicted 3D pose and serves as weak supervision for3D pose regression.
Shape and Viewpoint without Keypoints
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera
HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization
3D hand reconstruction from images is a widely-studied problem in computer vision and graphics, and has a particularly high relevance for virtual and augmented reality. Although several 3D hand
End-to-End Hand Mesh Recovery From a Monocular RGB Image
TLDR
Qualitative experiments show that the HAMR framework is capable of recovering appealing 3D hand mesh even in the presence of severe occlusions, and outperforms the state-of-the-art methods for both 2D and3D hand pose estimation from a monocular RGB image on several benchmark datasets.
Self-Supervised Learning of Detailed 3D Face Reconstruction
TLDR
An end-to-end learning framework for detailed 3D face reconstruction from a single image that combines a photometric loss and a facial perceptual loss between the input face and the rendered face, and a displacement map in UV-space to represent a3D face.
Learning to Estimate 3D Hand Pose from Single RGB Images
TLDR
A deep network is proposed that learns a network-implicit 3D articulation prior that yields good estimates of the 3D pose from regular RGB images, and a large scale 3D hand pose dataset based on synthetic hand models is introduced.
Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz
TLDR
This first approach that jointly learns a regressor for face shape, expression, reflectance and illumination on the basis of a concurrently learned parametric face model is presented, which compares favorably to the state-of-the-art in terms of reconstruction quality, better generalizes to real world faces, and runs at over 250 Hz.
Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs
TLDR
This work proposes to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane to produce final 3D hand pose estimation with learned pose priors.
...
...