V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

  title={V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map},
  author={Gyeongsik Moon and Ju Yong Chang and Kyoung Mu Lee},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs. [] Key Method To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates…

HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

This work proposes a novel architecture with 3D convolutions trained in a weakly-supervised manner that produces visually more reasonable and realistic hand shapes on NYU and BigHand2.2M datasets compared to the existing approaches.

FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks

  • Rohan Lekhwani
  • Computer Science
    Advances in Intelligent Systems and Computing
  • 2020
This paper presents a novel approach to estimate 3D hand joint locations from 2D depth images using a voxel to coordinate (V2C) approach that outperforms state-of-the-art methods with respect to the time it takes to train and predict 3Dhand pose locations.

Supervised High-Dimension Endecoder Net: 3D End to End Prediction Network for Mark-less Human Pose Estimation from Single Depth Map

  • Lili ShenYing Chen
  • Computer Science
    2019 5th International Conference on Control, Automation and Robotics (ICCAR)
  • 2019
A network called Supervised High Dimension Endecoder Network is designed, which can be used to predict key points of markless human in a single depth map in 3D space and shows improved prediction accuracy compared to the state-of-the-art approaches.

Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation

This work proposes to leverage recent advances in reliable 2D pose estimation with Convolutional Neural Networks to estimate the 3D pose of people from depth images in multi-person Human-Robot Interaction (HRI) scenarios by using the depth information to obtain 3D lifted points from 2D body landmark detections.

Graph-Based CNNs With Self-Supervised Module for 3D Hand Pose Estimation From Monocular RGB

An end-to-end network for predicting 3D hand pose from a single RGB image is designed and implemented and the effectiveness of the proposed method is evaluated, which achieves state-of-the-art performance on the benchmark datasets.

HandVoxNet++: 3D Hand Shape and Pose Estimation Using Voxel-Based Neural Networks

The proposed HandVoxNet++, a voxel-based deep network with 3D and graph convolutions trained in a fully supervised manner, achieves the state-of-the-art performance on SynHand5M and HANDS19 datasets, respectively.

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners and 2D CNN is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers.

DeepHPS: End-to-end Estimation of 3D Hand Pose and Shape by Learning from Synthetic Depth

A fully supervised deep network is proposed which learns to jointly estimate a full 3D hand mesh representation and pose from a single depth image to improve model based learning (hybrid) methods' results on two of the public benchmarks.

A graph-based approach for absolute 3D hand pose estimation using a single RGB image

This work proposes a multi-stage GCN-based (Graph Convolutional Networks) approach to estimate the absolute 3D hand pose from a single RGB image and validates the proposed approach on a newly created dataset that contains RGB hand images with accurate 3D pose annotations and high lighting and poses variations.

Pixel-wise Regression: 3D Hand Pose Estimation via Spatial-form Representation and Differentiable Decoder

A novel Pixel-wise Regression method, which use spatial-form representation (SFR) and differentiable decoder (DD) to solve the two problems of 3D Hand pose estimation from a single depth image and mean 3D joint error.



Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs

This work proposes to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane to produce final 3D hand pose estimation with learned pose priors.

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

This paper proposes a fine discretization of the 3D space around the subject and trains a ConvNet to predict per voxel likelihoods for each joint, which creates a natural representation for 3D pose and greatly improves performance over the direct regression of joint coordinates.

3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images

Experiments show that the proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as the implementation runs at over 215 fps on a standard computer with a single GPU.

Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals

This paper investigates the top 10 state-of-the-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction.

Towards Good Practices for Deep 3D Hand Pose Estimation

Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose Estimation

This work proposes modelling the statistical relationship of 3D hand poses and corresponding depth images using two deep generative models with a shared latent space to prevent over-fitting and to better exploit unlabeled depth maps.

End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data

Experimental results suggest that feeding a tree-shaped CNN, specialized in local poses, into a fusion network for modeling joints correlations and dependencies, helps to increase the precision of final estimations, outperforming state-of-the-art results on NYU and SyntheticHand datasets.

DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features

The proposed DeepHand to estimate the 3D pose of a hand using depth data from commercial 3D sensors compares favorably to state-of-the-art methods while achieving real time performance (≈ 32 FPS) on a standard computer.

Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture

The Latent Regression Forest is presented, a novel framework for real-time, 3D hand pose estimation from a single depth image and shows that the LRF out-performs state-of-the-art methods in both accuracy and efficiency.