Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

  title={Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection},
  author={Hangting Ye and Wentao Zhu and Chun-yu Wang and Rujie Wu and Yizhou Wang},
. While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes. We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes and estimating X, Y, Z coordinates from them separately. To that end, we first localize each person by a 3D bounding box by estimating a 2D box and its height based on… 

Figures and Tables from this paper



VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild

VoxelTrack employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment and outperforms the state-of-the-art methods by a large margin on four public datasets.

VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment

An end-to-end solution which directly operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space and outperforms the state-of-the-arts on the public datasets.

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

End-to-end Dynamic Matching Network for Multi-view Multi-person 3D Pose Estimation

This work proposes a novel matching algorithm that can match 2d poses from multiple views efficiently and is robust and able to deal with situations of incomplete and false 2d detection as well.

Cross View Fusion for 3D Human Pose Estimation

This work introduces a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views and presents a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D pose.

Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking

A Deep Conditional Variational Autoencoder based model that synthesizes diverse anatomically plausible 3D-pose samples conditioned on the estimated 2D- pose is proposed, and it is shown that CVAE-based 3d-pose sample set is consistent with the 2D to 3D lifting and helps tackling the inherent ambiguity in2D-to-3D lifting.

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

This paper decomposes the task of estimating the 3D human poses of multiple persons from multiple calibrated camera views into two stages, i.e. person localization and pose estimation, and proposes three task-specific graph neural networks for effective message passing.

Context Modeling in 3D Human Pose Estimation: A Unified Perspective

This work proposes ContextPose based on attention mechanism that allows enforcing soft limb length constraints in a deep network and effectively reduces the chance of getting absurd 3D pose estimates with incorrect limb lengths and achieves state-of-the-art results on two benchmark datasets.

Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation

A novel implementation of DLT is proposed that is orders of magnitude faster on GPU architectures than standard SVD-based triangulation methods, and outperforms or performs comparably to the state-of-the-art volumetric methods, while, unlike them, yielding real-time performance.

Multiple human 3D pose estimation from multiview images

Experimental results indicate that the proposed method achieves substantial improvements over the existing state-of-the-art methods in terms of the probability of correct pose and the mean per joint position error performance measures.