Corpus ID: 236428458

Hand Image Understanding via Deep Multi-Task Learning

  title={Hand Image Understanding via Deep Multi-Task Learning},
  author={Xiong Zhang and Hongsheng Huang and Jianchao Tan and Hongmin Xu and Cheng Yang and Guozhu Peng and Lei Wang and Ji Liu},
  • Xiong Zhang, Hongsheng Huang, +5 authors Ji Liu
  • Published 24 July 2021
  • Computer Science
  • ArXiv
Stem Module. The stem module consists of two 7× 7 convolutional layers with stride 2, and the channels are set to 64 and 128, respectively. Encoder. We employ the main-body of ResNet-50 [5] to implement the encoder. Specifically, the beginning conv1 together with the prediction head are removed, while the remaining conv2 x, conv3 x, conv4 x, and conv5 x are adopted to build the encoder module, and the number of repetitions are 3,4,5, and 6, respectively. Heat-Map Decoder. The heat-map decoder… Expand


3D Hand Shape and Pose From Images in the Wild
This work presents the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild, consisting of the concatenation of a deep convolutional encoder, and a fixed model-based decoder. Expand
LCR-Net: Localization-Classification-Regression for Human Pose
This work proposes an end-to-end architecture for joint 2D and 3D human pose estimation in natural images that significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment. Expand
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. Expand
JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image
A novel pixel-wise prediction-based method for 3D hand pose estimation that achieves new state-of-the-art accuracy while running very efficiently with around a speed of 110fps on a single NVIDIA 1080Ti GPU. Expand
Deep Multitask Architecture for Integrated 2D and 3D Human Sensing
A deep multitask architecture for fully automatic 2d and 3d human sensing (DMHS), including recognition and reconstruction, in monocular images, is proposed and shows that in the wild the monocular RGB architecture is perceptually competitive to a state-of-the art (commercial) Kinect system based on RGB-D data. Expand
CrossInfoNet: Multi-Task Information Sharing Based Hand Pose Estimation
The proposed CrossInfoNet decomposes hand pose estimation task into palm pose estimation sub-task and finger pose estimationsub-task, and adopts two-branch crossconnection structure to share the beneficial complementary information between the sub-tasks. Expand
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation. Expand
Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs
This work proposes to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane to produce final 3D hand pose estimation with learned pose priors. Expand
Cross-Modal Deep Variational Hand Pose Estimation
This work proposes a method to learn a statistical hand model represented by a cross-modal trained latent space via a generative deep neural network, which can be directly used to estimate 3D hand poses from RGB images, outperforming the state-of-the art in different settings. Expand
Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild
We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a directExpand