Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate Model for Real-time Edge Computing

  title={Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate Model for Real-time Edge Computing},
  author={Masayuki Yamazaki and Eigo Mori},
  journal={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)},
  • Masayuki Yamazaki, Eigo Mori
  • Published 8 November 2021
  • Computer Science
  • 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)
In this study, we present a pragmatic lightweight pose estimation model. Our model can achieve real-time predictions using low-power embedded devices. This system was found to be very accurate and achieved a 94.5% accuracy of SOTA HRNet 256x192 using a computational cost of only 3.8% on COCO test dataset. Our model adopts an encoder-decoder architecture and is carefully downsized to improve its efficiency. We especially focused on optimizing the deconvolution layers and observed that the… 

Figures and Tables from this paper

Iterative Pruning-based Model Compression for Pose Estimation on Resource-constrained Devices
A pruning-based model compression scheme, aiming at achieving an efficient model that has strength in both accuracy and inference time on an embedded device environment with limited resources, is proposed and a resource-efficient 2D pose estimation model using HRNet is developed.


Simple and Lightweight Human Pose Estimation
This paper redesigns a lightweight bottleneck block with two non-novel concepts: depthwise convolution and attention mechanism and presents a Lightweight Pose Network (LPN) following the architecture design principles of SimpleBaseline.
EfficientPose: Efficient Human Pose Estimation with Neural Architecture Search
This paper proposes an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head, by implementing the differentiable neural architecture search method and customize the backbone network design for pose estimation and reduce the computation cost with negligible accuracy degradation.
Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose
In this work we adapt multi-person pose estimation architecture to use it on edge devices. We follow the bottom-up approach from OpenPose, the winner of COCO 2016 Keypoints Challenge, because of its
Integral Knowledge Distillation for Multi-Person Pose Estimation
A novel compact and lightweight framework to train more efficient estimators using knowledge distillation, which can achieve competitive performance with the most state-of-the-art methods and consume only $\text{35}\%$ model parameters and GFLOPs of the authors' baseline (SimpleBaseline-ResNet-50) on the COCO dataset.
Distribution-Aware Coordinate Representation for Human Pose Estimation
This work finds that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for the performance, and forms a novel Distribution-Aware coordinate Representation of Keypoints (DARK) method.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
Deep High-Resolution Representation Learning for Human Pose Estimation
This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.
Stacked Hourglass Networks for Human Pose Estimation
This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.
Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields
We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.