HandVoxNet++: 3D Hand Shape and Pose Estimation Using Voxel-Based Neural Networks

  title={HandVoxNet++: 3D Hand Shape and Pose Estimation Using Voxel-Based Neural Networks},
  author={Jameel Nawaz Malik and Soshi Shimada and Ahmed Elhayek and Sk Aziz Ali and Christian Theobalt and Vladislav Golyanik and Didier Stricker},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
3D hand shape and pose estimation from a single depth map is a new and challenging computer vision problem with many applications. Existing methods addressing it directly regress hand meshes via 2D convolutional neural networks, which leads to artifacts due to perspective distortions in the images. To address the limitations of the existing methods, we develop HandVoxNet++, i.e., a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner. The input to our… 

THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision

The THOR-Net is proposed which combines the power of GCNs, Transformer, and self-supervision to realistically reconstruct two hands and an object from a single RGB image to achieve State-of-the-art results in Hand shape estimation on the HO-3D dataset.

Scene-aware Egocentric 3D Human Pose Estimation

The experimental results of the new evaluation sequences show that the predicted 3D egocentric poses are accurate and physically plausible in terms of human-scene interaction, demonstrating that the method outperforms the state-of-the-art methods both quantitatively and qualitatively.



HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

This work proposes a novel architecture with 3D convolutions trained in a weakly-supervised manner that produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets compared to the existing approaches.

3D Hand Shape and Pose Estimation From a Single RGB Image

This work proposes a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose and proposes a weakly-supervised approach by leveraging the depth map as a weak supervision in training.

Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks

A novel 3D CNN-based method that can capture the 3D spatial structure of the hand and accurately regress full 3D hand pose in a single pass is presented and is fast as the implementation runs at over 91 frames per second on a standard computer with a single GPU.

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

This model is designed as a 3D CNN that provides accurate estimates while running in real-time and outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based3D hand pose estimation challenge.

WHSP-Net: A Weakly-Supervised Approach for 3D Hand Shape and Pose Recovery from a Single Depth Image

A novel weakly-supervised approach for 3D hand shape and pose recovery (named WHSP-Net) from a single depth image by learning shapes from unlabeled real data and labeled synthetic data that outperforms state-of-the-art methods that output more than the joint positions and shows competitive performance on 3D pose estimation task.

End-to-End Hand Mesh Recovery From a Monocular RGB Image

Qualitative experiments show that the HAMR framework is capable of recovering appealing 3D hand mesh even in the presence of severe occlusions, and outperforms the state-of-the-art methods for both 2D and3D hand pose estimation from a monocular RGB image on several benchmark datasets.

3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images

Experiments show that the proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as the implementation runs at over 215 fps on a standard computer with a single GPU.

Structure-Aware 3D Hand Pose Regression from a Single Depth Image

A novel structure-aware CNN-based algorithm which learns to automatically segment the hand from a raw depth image and estimate 3D hand pose jointly with new structural constraints to maintain a structural relation between the estimated joint keypoints is proposed.

Convolutional Mesh Regression for Single-Image Human Shape Reconstruction

This paper addresses the problem of 3D human pose and shape estimation from a single image by proposing a graph-based mesh regression, which outperform the comparable baselines relying on model parameter regression, and achieves state-of-the-art results among model-based pose estimation approaches.