SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

  title={SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation},
  author={Jiabin Zhang and Zheng Zhu and Jiwen Lu and Junjie Huang and Guan Huang and Jie Zhou},
The practical application requests both accuracy and efficiency on multi-person pose estimation algorithms. But the high accuracy and fast inference speed are dominated by top-down methods and bottom-up methods respectively. To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE). Specifically, in the training process, we enable SIMPLE to… 

Figures and Tables from this paper

MultiPoseSeg: Feedback Knowledge Transfer for Multi-Person Pose Estimation and Instance Segmentation

MultiPoseSeg is proposed, a data preparation and feedback knowledge transfer system designed for multi-person pose estimation and instance segmentation that outperforms state-of-the-art bottom-up models in terms of both accuracy and runtime performance.

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

A novel Dual-Pipeline Integrated Transformer by integrating top-down and bottom-up pipelines to explore the visual clues of different receptive people and achieve their complementarity and the reported quantitative and qualitative results on two public datasets demonstrate the e-ectiveness of the DPIT for human pose estimation.

Cofopose: Conditional 2D Pose Estimation with Transformers

Cofopose is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation that uses conditional cross-attention, a conditional DEtection TRansformer, and an encoder-decoder in the transformer framework; this allows it to achieve person andKeypoint detection.

Recent Advances of Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

This article comprehensively summarize the 2D and 3D representations of human body and summarizes the mainstream and milestone approaches for these human body presentations since the year 2014 under unified frameworks to provide a comprehensive and holistic 2D-to-3D perspective.

Interactive Labeling for Human Pose Estimation in Surveillance Videos

This work combines multiple techniques into a single web-based general-purpose annotation application that enables annotators to interactively detect pedestrians, re-identify them throughout the sequence, estimate their poses, and correct annotation suggestions in the same interface.

A View Independent Classification Framework for Yoga Postures

This work employs transfer learning from human pose estimation models for extracting 136 key-points spread all over the body to train a random forest classifier which is used for estimation of the Yogasanas.



HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

HigherHRNet is presented, a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids that surpasses all top-down methods on CrowdPose test and achieves new state-of-the-art result on COCO test-dev, suggesting its robustness in crowded scene.

Fast Human Pose Estimation

This work investigates the under-studied but practically critical pose model efficiency problem, and presents a new Fast Pose Distillation (FPD) model learning strategy that trains a lightweight pose neural network architecture capable of executing rapidly with low computational cost.

DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model

The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body part

MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network

On the COCO keypoints dataset, the pose estimation method outperforms all previous bottom-up methods both in accuracy and speed; it also performs on par with the best top-down methods while being at least 4x faster.

UniPose: Unified Human Pose Estimation in Single Images and Videos

  • Bruno ArtachoA. Savakis
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
The results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of-the-art results in single person pose detection for both single images and videos.

Cascaded Pyramid Network for Multi-person Pose Estimation

A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.

PifPaf: Composite Fields for Human Pose Estimation

The new PifPaf method, which uses a Part Intensity Field to localize body parts and a Part Association Field to associate body parts with each other to form full human poses, outperforms previous methods at low resolution and in crowded, cluttered and occluded scenes.

Towards Accurate Multi-person Pose Estimation in the Wild

This work proposes a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task by using a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and by introducing a novel aggregation procedure to obtain highly localized keypoint predictions.

Pose Partition Networks for Multi-person Pose Estimation

This paper proposes a novel Pose Partition Network (PPN) to address the challenging multi-person pose estimation problem and implements PPN with the Hourglass architecture as the backbone network to simultaneously learn joint detector and dense regressor.

Learning Delicate Local Representations for Multi-Person Pose Estimation

This paper proposes an efficient attention mechanism - Pose Refine Machine (PRM) to make a trade-off between local and global representations in output features and further refine the keypoint locations.