3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

  title={3D Human Pose, Shape and Texture from Low-Resolution Images and Videos},
  author={Xiangyu Xu and Hao Chen and Francesc Moreno-Noguer and L{\'a}szl{\'o} A. Jeni and Fernando De la Torre},
  journal={IEEE transactions on pattern analysis and machine intelligence},
3D human pose and shape estimation from monocular images has been an active research area in computer vision. Existing deep learning methods for this task rely on high-resolution input, which however, is not always available in many scenarios such as video surveillance and sports broadcasting. Two common approaches to deal with low-resolution images are applying super-resolution techniques to the input, which may result in unpleasant artifacts, or simply training one model for each resolution… Expand
Body Size and Depth Disambiguation in Multi-Person Reconstruction from Single Images
This work devises a novel optimization scheme that learns the appropriate body scale and relative camera pose, by enforcing the feet of all people to remain on the ground floor, and is able to robustly estimate the body translation and shape of multiple people while retrieving their spatial arrangement. Expand
3D Human Texture Estimation from a Single Image with Transformers
A Transformer-based framework for 3D human texture estimation from a single image is proposed, able to effectively exploit the global information of the input image, overcoming the limitations of existing methods that are solely based on convolutional neural networks. Expand
Tracking People with 3D Representations
A method is developed which in addition to extracting the 3D geometry of the person as a SMPL mesh, also extracts appearance as a texture map on the triangles of the mesh, which serves as a 3D representation for appearance that is robust to viewpoint and pose changes. Expand


3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning
A novel algorithm called RSC-Net is proposed, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme, which is able to learn the 3D body shape and pose across different resolutions with a single model. Expand
Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publiclyExpand
Learning to Estimate 3D Human Pose and Shape from a Single Color Image
This work addresses the problem of estimating the full body 3D human pose and shape from a single color image and proposes an efficient and effective direct prediction method based on ConvNets, incorporating a parametric statistical body shape model (SMPL) within an end-to-end framework. Expand
Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints
This paper leverage state-of-the-art deep multi-task neural networks and parametric human and scene modeling, towards a fully automatic monocular visual sensing system for multiple interacting people, which infers the 2d and 3d pose and shape of multiple people from a single image. Expand
Learning from Synthetic Humans
This work presents SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data and shows that CNNs trained on this synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Expand
VIBE: Video Inference for Human Body Pose and Shape Estimation
This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. Expand
Learning to Reconstruct People in Clothing From a Single RGB Camera
We present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4Expand
DeepHuman: 3D Human Reconstruction From a Single Image
DeepHuman, an image-guided volume-to-volume translation CNN for 3D human reconstruction from a single RGB image, leverages a dense semantic representation generated from SMPL model as an additional input to reduce the ambiguities associated with the reconstruction of invisible areas. Expand
Learning effective human pose estimation from inaccurate annotation
A significant increase in pose estimation accuracy is demonstrated, while simultaneously reducing computational expense by a factor of 10, and a dataset of10,000 highly articulated poses is contributed. Expand
End-to-End Recovery of Human Shape and Pose
This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. Expand