Efficient Human Pose Estimation by Learning Deeply Aggregated Representations

  title={Efficient Human Pose Estimation by Learning Deeply Aggregated Representations},
  author={Zhengxiong Luo and Zhicheng Wang and Yuanhao Cai and Guan'an Wang and Yan Huang and Liang Wang and Erjin Zhou and Tieniu Tan and Jian Sun},
  journal={2021 IEEE International Conference on Multimedia and Expo (ICME)},
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale infonnation mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very deep and complex. Instead, we focus on exploiting multi-scale information from layers with… 
Adaptive Dilated Convolution For Human Pose Estimation
An adaptive dilated convolution (ADC) is proposed that can generate and fuse multi-scale features of the same spatial sizes by setting different dilation rates for different channels, which enables ADC to adaptively adjust the fused scales and thus ADC may generalize better to various human sizes.
MEMe: A Mutually Enhanced Modeling Method for Efficient and Effective Human Pose Estimation
This paper proposes a MEMe to reconstruct a lightweight baseline model, EffBase transferred intuitively from EfficientDet, into the efficient and effective pose ( EEffPose) net, which contains three mutually enhanced modules: the Enhanced EffNet (EEffNet) backbone, the total fusion neck (TFNeck), and the final attention head (FAHead).
Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training
Qualitative validation shows that noise generated by PNGAN is highly similar to real noise in terms of intensity and distribution, which clearly suggests the high similarity between PNGAN generating and real noisy images.
RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark
A novel Transformer-based Generative Adversarial Network (RFormer) is proposed to restore the real degradation of clinical fundus images and significantly outperforms the state-of-the-art (SOTA) methods.


Multi-context Attention for Human Pose Estimation
This paper proposes to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation and designs novel Hourglass Residual Units (HRUs) to increase the receptive field of the network.
Learning Delicate Local Representations for Multi-Person Pose Estimation
This paper proposes an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations.
Learning Feature Pyramids for Human Pose Estimation
This work designs a Pyramid Residual Module (PRMs) to enhance the invariance in scales of DCNNs and provides theoretic derivation to extend the current weight initialization scheme to multi-branch network structures.
Multi-Scale Structure-Aware Network for Human Pose Estimation
A robust multi-scale structure-aware neural network for human pose estimation that effectively improves state-of-the-art pose estimation methods that suffer from difficulties in scale varieties, occlusions, and complex multi-person scenarios.
Cascaded Pyramid Network for Multi-person Pose Estimation
A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.
Deep High-Resolution Representation Learning for Human Pose Estimation
This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.
Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information
Two novel modules to perform the enhancement of the information for the multi-person pose estimation by adopting the channel shuffle operation on the feature maps with different levels, promoting cross-channel information communication among the pyramid feature maps are proposed.
Deep Feature Pyramid Reconfiguration for Object Detection
A novel reconfiguration architecture is proposed to combine low-level representations with high-level semantic features in a highly-nonlinear yet efficient way to gather task-oriented features across different spatial locations and scales, globally and locally.
Feature Pyramid Networks for Object Detection
This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Fast Human Pose Estimation
This work investigates the under-studied but practically critical pose model efficiency problem, and presents a new Fast Pose Distillation (FPD) model learning strategy that trains a lightweight pose neural network architecture capable of executing rapidly with low computational cost.