• Corpus ID: 235390859

Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation

  title={Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation},
  author={Adrian Spurr and Pavlo Molchanov and Umar Iqbal and Jan Kautz and Otmar Hilliges},
Hand pose estimation is difficult due to different environmental conditions, objectand self-occlusion as well as diversity in hand shape and appearance. Exhaustively covering this wide range of factors in fully annotated datasets has remained impractical, posing significant challenges for generalization of supervised methods. Embracing this challenge, we propose to combine ideas from adversarial training and motion modelling to tap into unlabeled videos. To this end we propose what to the best… 

Figures and Tables from this paper

Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey
This survey presents comprehensive analysis of 3D hand pose estimation from the perspective of efficient annotation and learning, and investigates annotation methods classified as manual, synthetic-model-based, hand-sensor- based, and computational approaches.
Multi-view Image-based Hand Geometry Refinement using Differentiable Monte Carlo Ray Tracing
An image-based refinement is achieved through differentiable ray tracing, a method that has not been employed so far to relevant problems and is hereby shown to be superior to the approximative alternatives that have been employed in the past.


VIBE: Video Inference for Human Body Pose and Shape Estimation
This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.
Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects
This work proposes a novel end-to-end trainable pipeline that adapts the hand-object domain to the single hand- only domain, while learning for HPE, and significantly outperforms state-of-the-arts trained by hand-only data and is comparable to those supervised by HOI data.
Cross-Modal Deep Variational Hand Pose Estimation
This work proposes a method to learn a statistical hand model represented by a cross-modal trained latent space via a generative deep neural network, which can be directly used to estimate 3D hand poses from RGB images, outperforming the state-of-the art in different settings.
Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction
This work presents a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video and demonstrates that the approach allows us to improve the pose estimation accuracy by leveraging information from neighboring frames in low-data regimes.
GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB
This work proposes a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network, and uses a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images.
Disentangling Latent Hands for Image Synthesis and Pose Estimation
  • Linlin Yang, Angela Yao
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
Experiments show that the dVAE can synthesize highly realistic images of the hand specifiable by both pose and image background content and also estimate 3D hand poses from RGB images with accuracy competitive with state-of-the-art on two public benchmarks.
Human Motion Prediction via Spatio-Temporal Inpainting
This work argues that the L2 metric, considered so far by most approaches, fails to capture the actual distribution of long-term human motion, and proposes two alternative metrics, based on the distribution of frequencies, that are able to capture more realistic motion patterns.
Adversarial Geometry-Aware Human Motion Prediction
This work proposes a novel frame-wise geodesic loss as a geometrically meaningful, more precise distance measurement and presents a new learning procedure to simultaneously validate the sequence-level plausibility of the prediction and its coherence with the input sequence by introducing two global recurrent discriminators.
3D Hand Shape and Pose From Images in the Wild
This work presents the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild, consisting of the concatenation of a deep convolutional encoder, and a fixed model-based decoder.
Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints
This work proposes a set of novel losses that significantly reduce the depth ambiguity and allow the network to more effectively leverage additional 2D annotated images on the challenging freiHAND dataset.