LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering

@article{Yang2021LASORLA,
  title={LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering},
  author={Kaibing Yang and Renshu Gu and Maoyu Wang and Masahiro Toyoura and Gang Xu},
  journal={IEEE Transactions on Image Processing},
  year={2021},
  volume={31},
  pages={1938-1948}
}
A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework… 

Occluded Human Body Capture with Self-Supervised Spatial-Temporal Motion Prior

Experimental results show that the key-idea is to employ non-occluded human data to learn a joint-level spatial-temporal motion prior for occluded human with a self-supervised strategy, which can generate accurate and coherent human motions with good generalization ability and runtime efficiency.

A Progressive Quadric Graph Convolutional Network for 3D Human Mesh Recovery

A Progressive Quadric Graph Convolutional Network (PQ-GCN) is proposed, and a simple and fast method for 3D human mesh recovery from a single image in the wild is designed, using 66% fewer parameters than the existing method, Pose2Mesh.

PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation

We consider the problem of reconstructing a 3D mesh of the human body from a single 2D image as a model-in-the-loop optimization problem. Existing approaches often regress the shape, pose, and

References

SHOWING 1-10 OF 45 REFERENCES

Object-Occluded Human Shape and Pose Estimation From a Single Color Image

This paper proposes a novel two-branch network architecture to train an end-to-end regressor via the latent feature supervision, which also includes a novel saliency map sub-net to extract the human information from object-occluded color images.

Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild

STRAPS (Synthetic Training for Real Accurate Pose and Shape), a system that utilises proxy representations, such as silhouettes and 2D joints, as inputs to a shape and pose regression neural network, which is trained with synthetic training data (generated on-the-fly during training using the SMPL statistical body model) to overcome data scarcity.

3DCrowdNet: 2D Human Pose-Guided3D Crowd Human Pose and Shape Estimation in the Wild

3DCrowdNet, a 2D human pose-guided 3D crowd pose and shape estimation system for in-the-wild scenes that designs its system to leverage the robust 2D pose outputs from off- the-shelf2D pose estimators, which guide a network to focus on a target person and provide essential human articulation information.

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

A novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image and a large-scale synthetic dataset utilizing web-crawled Mocap sequences, 3D scans and animations is constructed.

Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision

We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly

Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints

This paper leverage state-of-the-art deep multi-task neural networks and parametric human and scene modeling, towards a fully automatic monocular visual sensing system for multiple interacting people, which infers the 2d and 3d pose and shape of multiple people from a single image.

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

This paper proposes a temporal regression network with a gated convolution module to transform 2D joints to 3D and recover the missing occluded joints in the meantime and shows that the proposed method outperforms most state-of-the-art 2D-to-3D pose estimation methods, especially for the scenarios with heavy occlusions.

Learning to Estimate 3D Human Pose and Shape from a Single Color Image

This work addresses the problem of estimating the full body 3D human pose and shape from a single color image and proposes an efficient and effective direct prediction method based on ConvNets, incorporating a parametric statistical body shape model (SMPL) within an end-to-end framework.

End-to-End Recovery of Human Shape and Pose

This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.

Estimating Human Pose from Occluded Images

Experimental results on synthetic and real data sets bear out the theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.