• Corpus ID: 239050352

Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation

  title={Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation},
  author={Haiyang Liu and Dingli Luo and Songlin Du and Takeshi Ikenaga},
Bottom-up based multi-person pose estimation approaches use heatmaps with auxiliary predictions to estimate joints positions and belonging at one time. Recently, various combinations between auxiliary predictions and heatmaps have been proposed for higher performance, these predictions are supervised by the corresponding L2 loss function directly. However, the lack of more explicit supervision results in low features utilization and contradictions between predictions in one model. To solve… 


Resolution Irrelevant Encoding and Difficulty Balanced Loss Based Network Independent Supervision for Multi-Person Pose Estimation
Two network independent supervision methods are proposed to improve the joints' location accuracy with general applicability and high computational efficiency in multi-person pose estimation and to improve network training efficiency.
Multi-Scale Structure-Aware Network for Human Pose Estimation
A robust multi-scale structure-aware neural network for human pose estimation that effectively improves state-of-the-art pose estimation methods that suffer from difficulties in scale varieties, occlusions, and complex multi-person scenarios.
Towards Accurate Multi-person Pose Estimation in the Wild
This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions.
Cascaded Pyramid Network for Multi-person Pose Estimation
A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.
Human Pose Estimation with Spatial Contextual Information
This work presents two conceptually simple and yet computational efficient modules, namely Cascade Prediction Fusion (CPF) and Pose Graph Neural Network (PGNN), to exploit underlying contextual information in human pose estimation.
Multi-context Attention for Human Pose Estimation
This paper proposes to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation and designs novel Hourglass Residual Units (HRUs) to increase the receptive field of the network.
Human Pose Estimation via Convolutional Part Heatmap Regression
A CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions is proposed, and achieves top performance on the MPII and LSP data sets.
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling, and employs a convolutional network which learns to detect individual keypoints and predict their relative displacements, allowing us to group keypoints into person pose instances.
PifPaf: Composite Fields for Human Pose Estimation
The new PifPaf method, which uses a Part Intensity Field to localize body parts and a Part Association Field to associate body parts with each other to form full human poses, outperforms previous methods at low resolution and in crowded, cluttered and occluded scenes.
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.