Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information

@article{Su2019MultiPersonPE,
  title={Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information},
  author={Kai Su and Dongdong Yu and Zhenqi Xu and Xin Geng and Changhu Wang},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019},
  pages={5667-5675}
}
  • Kai SuDongdong Yu Changhu Wang
  • Published 9 May 2019
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Multi-person pose estimation is an important but challenging problem in computer vision. Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps. In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation. First, a Channel Shuffle Module (CSM) is proposed to adopt the channel shuffle… 

Figures and Tables from this paper

Multi-Person Pose Estimation Based on Hierarchical Residual-Like Connections

Two novel modules are presented to enhance the multi-scale feature and increase the range of receptive fields by constructing hierarchical residual-like connections in pyramid feature maps.

Human pose estimation with gated multi-scale feature fusion and spatial mutual information

A new structure of gated multi-scale feature fusion (GMSFF) is proposed, which aims to selectively import high-level features to make up for the missing semantic information of low-resolution feature maps.

OCPAN: multi‐person pose estimation with more attention on residual information

The authors present an optimised cascaded pyramid attention network composed of two novel modules to reduce the redundant information and highlight the residual information in the bottleneck for more accurate results.

CFENet: Content-aware feature enhancement network for multi-person pose estimation

Comprehensive experiments demonstrate that the proposed approach outperforms most of the popular methods and achieves a competitive performance with the state-of-the-art methods over three benchmark datasets: the recent big dataset CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset.

Multi-person Pose Estimation with Object Occlusion Information

Two novel modules to group invisible keypoints occluded by various objects into the right body part are presented and experimental results show that the model has greater performance and faster inference speed compared to most of previous methods.

Scale-aware attention-based multi-resolution representation for multi-person pose estimation

A novel network named ‘Scale-aware attention-based multi-resolution representation network’ (SaMr-Net) which targets to make the proposed method against scale variation and prevent the detail information loss in upsampling, leading more precisely keypoint estimation.

Scale-aware attention-based multi-resolution representation for multi-person pose estimation

A novel network named ‘Scale-aware attention-based multi-resolution representation network’ (SaMr-Net) which targets to make the proposed method against scale variation and prevent the detail information loss in upsampling, leading more precisely keypoint estimation.

Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection

A novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation outperforms the state-of-the-art methods and achieves the superior performance over three benchmark datasets: the recent big dataset CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset.

Fixed-resolution representation network for human pose estimation

A novel architecture named fixed-resolution representation network for human pose estimation, which maintains fixed- resolution through the whole process to keep rich spatial-structural information, is proposed.

Multistage Polymerization Network for Multiperson Pose Estimation

A multistage polymerization network (MPN) for multiperson pose estimation that continuously learns rich underlying spatial information by fusing features within the layers and adds hierarchical connections between feature maps at the same resolution for interlayer fusion.
...

References

SHOWING 1-10 OF 28 REFERENCES

Cascaded Pyramid Network for Multi-person Pose Estimation

A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.

Towards Accurate Multi-person Pose Estimation in the Wild

This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions.

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn

Multi-context Attention for Human Pose Estimation

This paper proposes to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation and designs novel Hourglass Residual Units (HRUs) to increase the receptive field of the network.

Pose-Invariant Embedding for Deep Person Re-Identification

This paper introduces pose-invariant embedding (PIE) as a pedestrian descriptor and shows that PoseBox alone yields decent re-ID accuracy and that when integrated in the PBF network, the learned PIE descriptor produces competitive performance compared with state-of-the-art approaches.

Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images

A multi-task deep neural network with differentiable stages where the person grouping problem is formulated as an integer program based on learned body part scores parameterized by both 2d and 3d information.

An Approach to Pose-Based Action Recognition

This work improves a state of the art method for estimating human joint locations from videos and incorporates additional segmentation cues and temporal constraints to select the ``best'' one, which is able to localize body joints more accurately than existing methods.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

Feature Pyramid Networks for Object Detection

This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

This paper introduces a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channel-wise Attentions in a CNN that significantly outperforms state-of-the-art visual attention-based image captioning methods.