InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation

  title={InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation},
  author={Dahu Shi and Xing Wei and Xiaodong Yu and Wenming Tan and Ye Ren and Shiliang Pu},
  journal={Proceedings of the 29th ACM International Conference on Multimedia},
  • Dahu Shi, Xing Wei, +3 authors Shiliang Pu
  • Published 2021
  • Computer Science
  • Proceedings of the 29th ACM International Conference on Multimedia
Multi-person pose estimation is an attractive and challenging task. Existing methods are mostly based on two-stage frameworks, which include top-down and bottom-up methods. Two-stage methods either suffer from high computational redundancy for additional person detectors or they need to group keypoints heuristically after predicting all the instance-agnostic keypoints. The single-stage paradigm aims to simplify the multi-person pose estimation pipeline and receives a lot of attention. However… Expand

Figures and Tables from this paper


Cascaded Pyramid Network for Multi-person Pose Estimation
A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0. Expand
Single-Stage Multi-Person Pose Machines
The first single-stage model, Single-stage multi-person Pose Machine (SPM), is presented to simplify the pipeline and lift the efficiency for multi- person pose estimation, and a novel Structured Pose Representation (SPR) is proposed that unifies person instance and body joint position representations. Expand
FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions
The experiment results show that FCPose is a simple yet effective multi-person pose estimation framework that offers better speed/accuracy trade-off than other state-of-the-art methods. Expand
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling, and employs a convolutional network which learns to detect individual keypoints and predict their relative displacements, allowing us to group keypoints into person pose instances. Expand
DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model
The goal of this paper is to advance the state-of-the-art of articulated pose estimation in scenes with multiple people. To that end we contribute on three fronts. We propose (1) improved body partExpand
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
HigherHRNet is presented, a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids that surpasses all top-down methods on CrowdPose test and achieves new state-of-the-art result on COCO test-dev, suggesting its robustness in crowded scene. Expand
Pose-native Network Architecture Search for Multi-person Human Pose Estimation
The Pose-native Network Architecture Search (PoseNAS) is presented to simultaneously design a better pose encoder and pose decoder for pose estimation and achieves state-of-the-art performance on three public datasets, MPII, COCO, and PoseTrack, with small-scale parameters compared with the existing methods. Expand
DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation
This paper proposes a novel Dynamic Graph Convolutional Module (DGCM), which takes into account all relations and construct dynamic graphs to tolerate large variations of human pose and achieves relative gains over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset. Expand
DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation
An approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other is proposed. Expand
DirectPose: Direct End-to-End Multi-Person Pose Estimation
The first direct end-to-end multi-person pose estimation framework, termed DirectPose, is proposed, which directly predicts instance-aware keypoints for all the instances from a raw input image, eliminating the need for heuristic grouping in bottom-up methods or bounding-box detection and RoI operations in top-down ones. Expand