Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images

@inproceedings{Cai2018WeaklySupervised3H,
  title={Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images},
  author={Yujun Cai and Liuhao Ge and Jianfei Cai and Junsong Yuan},
  booktitle={ECCV},
  year={2018}
}
Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. [] Key Method Particularly, we propose a weakly-supervised method, adaptating from fully-annotated synthetic dataset to weakly-labeled real-world dataset with the aid of a depth regularizer, which generates depth maps from predicted 3D pose and serves as weak supervision for 3D pose…
3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images
TLDR
A weakly-supervised method, adaptating from fully-annotated synthetic dataset toWeakly-labeled real-world single RGB dataset with the aid of a depth regularizer, which serves as weak supervision for 3D pose prediction, which proves the effectiveness of the proposed depthRegularizer and the CVAE-based framework.
Adaptive Wasserstein Hourglass for Weakly Supervised Hand Pose Estimation from Monocular RGB
TLDR
A domain adaptation method called Adaptive Wasserstein Hourglass (AW Hourglass) is proposed for weakly-supervised 3D hand pose estimation, which aims to distinguish the difference and explore the common characteristics of synthetic and real-world datasets.
3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data
TLDR
By using paired RGB and depth images, this paper is able to supervise the RGB-based network to learn middle layer features that mimic that of a network trained on largescale, accurately annotated depth data.
3D Hand Shape and Pose Estimation From a Single RGB Image
TLDR
This work proposes a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose and proposes a weakly-supervised approach by leveraging the depth map as a weak supervision in training.
Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos
TLDR
A new framework of training 3D pose estimation models from RGB images without using explicit 3D annotations, i.e., trained with only 2D information is proposed, which achieves surprisingly good results, with 3D estimation accuracy on par with the state-of-the-art models trained with3D annotations.
Model-based 3D Hand Reconstruction via Self-Supervised Learning
TLDR
This work proposes S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint and utilizes the consistency between 2D and 3D representations and a set of novel losses to rationalize outputs of the neural network.
Silhouette-Net: 3D Hand Pose Estimation from Silhouettes
TLDR
A new architecture is presented that automatically learns a guidance from implicit depth perception and solves the ambiguity of hand pose through end-to-end training and serves as the state-of-the-art of its own kind on estimating 3D hand poses from silhouettes.
Multi-Person Absolute 3D Human Pose Estimation with Weak Depth Supervision
TLDR
This work introduces a network that can be trained with additional RGB-D images in a weakly supervised fashion, and achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin.
Aligning Latent Spaces for 3D Hand Pose Estimation
TLDR
This work proposes to learn a joint latent representation that leverages other modalities as weak labels to boost the RGB-based hand pose estimator and significantly outperforms state-of-the-art on two public benchmarks.
Two-hand Global 3D Pose Estimation using Monocular RGB
TLDR
A novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands despite occlusion between two hands and complex background noise and estimates the 2D and 3D canonical joint locations without any depth information is proposed.
...
...

References

SHOWING 1-10 OF 45 REFERENCES
Learning to Estimate 3D Hand Pose from Single RGB Images
TLDR
A deep network is proposed that learns a network-implicit 3D articulation prior that yields good estimates of the 3D pose from regular RGB images, and a large scale 3D hand pose dataset based on synthetic hand models is introduced.
Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs
TLDR
This work proposes to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane to produce final 3D hand pose estimation with learned pose priors.
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach
TLDR
A weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure to regularize the 3D pose prediction, which is effective in the absence of ground truth depth labels.
Using a Single RGB Frame for Real Time 3D Hand Pose Estimation in the Wild
TLDR
This work capitalize on the latest advancements of deep learning, combining them with the power of generative hand pose estimation techniques to achieve real-time monocular 3D hand Pose estimation in unrestricted scenarios.
Egocentric hand pose estimation and distance recovery in a single RGB image
TLDR
This paper demonstrates the possibility to recover both the articulated hand pose and its distance from the camera with a single RGB camera in egocentric view with good performance on both a synthesized dataset and several real-world color image sequences that are captured in different environments.
Efficient Hand Pose Estimation from a Single Depth Image
  • Chi Xu, Li Cheng
  • Computer Science
    2013 IEEE International Conference on Computer Vision
  • 2013
TLDR
This work tackles the practical problem of hand pose estimation from a single noisy depth image, and proposes a dedicated three-step pipeline that is able to work with Kinect-type noisy depth images, and reliably produces pose estimations of general motions efficiently.
Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image
TLDR
An integrated approach is taken that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search for better 2D locations.
How to Refine 3D Hand Pose Estimation from Unlabelled Depth Data?
TLDR
This work shows how a simple convolutional neural network can be trained to adapt to unlabelled depth images from a real user’s hand, and validate the method on two existing and a new dataset, demonstrating that it strongly compare to state-of-the-art methods.
3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images
TLDR
Experiments show that the proposed 3D CNN based approach outperforms state-of-the-art methods on two challenging hand pose datasets, and is very efficient as the implementation runs at over 215 fps on a standard computer with a single GPU.
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
TLDR
This work proposes a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network, and uses a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images.
...
...