HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

@article{Malik2020HandVoxNetDV,
  title={HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map},
  author={Jameel Malik and I. Abdelaziz and Ahmed Elhayek and Soshi Shimada and Sk Aziz Ali and Vladislav Golyanik and Christian Theobalt and Didier Stricker},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={7111-7120}
}
3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. The state-of-the-art methods directly regress 3D hand meshes from 2D depth images via 2D convolutional neural networks, which leads to artefacts in the estimations due to perspective distortions in the images. In contrast, we propose a novel architecture with 3D convolutions trained in a weakly-supervised manner. The input to our method is a 3D voxelized depth map… 
HandVoxNet++: 3D Hand Shape and Pose Estimation using Voxel-Based Neural Networks
TLDR
This paper develops HandVoxNet++, a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner, and gains 41:09% and 13:7% higher shape alignment accuracy on SynHand5M and HANDS19 datasets, respectively.
HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization
3D hand reconstruction from images is a widely-studied problem in computer vision and graphics, and has a particularly high relevance for virtual and augmented reality. Although several 3D hand
3D Hand Pose and Shape Estimation from RGB Images for Improved Keypoint-Based Hand-Gesture Recognition
TLDR
A keypoint-based end-to-end framework for the 3D hand and pose estimation is presented, and successfully applied to the hand-gesture recognition task as a study case, indicating that the presented method is an effective solution able to generate stable 3D estimates for the hand pose and shape.
Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction
  • Xiao Tang, Tianyu Wang, Chi-Wing Fu
  • Computer Science
    ArXiv
  • 2021
TLDR
The quality of the results outperforms the state-of-the-art methods on handmesh/pose precision and hand-image alignment and several real-time AR scenarios are showcased.
A Pipeline for Hand 2-D Keypoint Localization Using Unpaired Image to Image Translation
TLDR
A Cycle-consistent Generative Adversarial Network is used to apply unpaired image-to-image translation and generate a depth image with colored predictions on the fingertips, wrist, and palm given a real depth image and achieves visually promising results on noisy depth images captured using the Microsoft Kinect.
Weakly-supervised hand part segmentation from depth images
TLDR
This paper proposes a data-driven method for hand part segmentation on depth maps without any need for extra effort to obtain segmentation labels and shows that a mIoU of 42% can be achieved with a model trained without using segmentation-based labels.
Local and Global Point Cloud Reconstruction for 3D Hand Pose Estimation
TLDR
This paper presents a novel pipeline for local and global point cloud reconstruction using a 3D hand template while learning a latent representation for pose estimation, and introduces a new multi-view hand posture dataset to obtain complete 3D point clouds of the hand in the real world.
3D Hand Pose Estimation via aligned latent space injection and kinematic losses
TLDR
A disentanglement stage is initially proposed to differentiate the significant pose-specific information from the irrelevant background noise and illumination variations of RGB images and a variational alignment stage is introduced to accurately align the latent spaces of the pose- specific and the true hand pose information, effectively improving the discrimination ability of the proposed methodology.
EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream
TLDR
This work addresses 3D hand pose estimation from monocular videos for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes, and designs a new neural approach which accepts a new event stream representation suitable for learning, trained on newly-generated synthetic event streams.
EventHands: Real-Time Neural 3D Hand Pose Estimation from an Event Stream
TLDR
This work addresses 3D hand pose estimation from monocular videos for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes, which has characteristics previously not demonstrated with a single RGB or depth camera.
...
1
2
...

References

SHOWING 1-10 OF 37 REFERENCES
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
TLDR
This model is designed as a 3D CNN that provides accurate estimates while running in real-time and outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based3D hand pose estimation challenge.
3D Hand Shape and Pose Estimation From a Single RGB Image
TLDR
This work proposes a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose and proposes a weakly-supervised approach by leveraging the depth map as a weak supervision in training.
DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth
TLDR
A fully supervised deep network is proposed which learns to jointly estimate a full 3D hand mesh representation and pose from a single depth image to improve model based learning (hybrid) methods' results on two of the public benchmarks.
WHSP-Net: A Weakly-Supervised Approach for 3D Hand Shape and Pose Recovery from a Single Depth Image
TLDR
A novel weakly-supervised approach for 3D hand shape and pose recovery (named WHSP-Net) from a single depth image by learning shapes from unlabeled real data and labeled synthetic data that outperforms state-of-the-art methods that output more than the joint positions and shows competitive performance on 3D pose estimation task.
Simple and effective deep hand shape and pose regression from a single depth image
TLDR
This study developed a simple and effective real-time CNN-based direct regression approach for simultaneously estimating the 3D hand shape and pose, as well as structure constraints for both egocentric and third person viewpoints by learning from the synthetic depth.
3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images
TLDR
Experiments show that the proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as the implementation runs at over 215 fps on a standard computer with a single GPU.
End-to-End Hand Mesh Recovery From a Monocular RGB Image
TLDR
Qualitative experiments show that the HAMR framework is capable of recovering appealing 3D hand mesh even in the presence of severe occlusions, and outperforms the state-of-the-art methods for both 2D and3D hand pose estimation from a monocular RGB image on several benchmark datasets.
Structure-Aware 3D Hand Pose Regression from a Single Depth Image
TLDR
A novel structure-aware CNN-based algorithm which learns to automatically segment the hand from a raw depth image and estimate 3D hand pose jointly with new structural constraints to maintain a structural relation between the estimated joint keypoints is proposed.
Dense 3D Regression for Hand Pose Estimation
TLDR
A simple and effective method for 3D hand pose estimation from a single depth frame based on dense pixel-wise estimation that outperforms all previous state-of-the-art approaches by a large margin and outperforms various other proposed methods.
Simultaneous Hand Pose and Skeleton Bone-Lengths Estimation from a Single Depth Image
TLDR
This work introduces a novel hybrid algorithm for estimating the 3D hand pose as well as bone-lengths of the hand skeleton at the same time, from a single depth image, and shows improved accuracy over the state-of-the-art on the combined dataset and the ICVL dataset that contain multiple subjects.
...
1
2
3
4
...