Human Synthesis and Scene Compositing

  title={Human Synthesis and Scene Compositing},
  author={Mihai Zanfir and Elisabeta Oneata and A. Popa and Andrei Zanfir and Cristian Sminchisescu},
Generating good quality and geometrically plausible synthetic images of humans with the ability to control appearance, pose and shape parameters, has become increasingly important for a variety of tasks ranging from photo editing, fashion virtual try-on, to special effects and image compression. In this paper, we propose HUSC, a HUman Synthesis and Scene Compositing framework for the realistic synthesis of humans with different appearance, in novel poses and scenes. Central to our formulation… Expand
Generating 3D People in Scenes Without People
The approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment that will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR. Expand
imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose
ImGHUM is presented, the first holistic generative model of 3D human shape and articulated pose, represented as a signed distance function, and has attached spatial semantics making it straightforward to establish correspondences between different shape instances, thus enabling applications that are difficult to tackle using classical implicit representations. Expand
Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency
A self-super supervised framework named SPICE (Self-supervised Person Image CrEation) that closes the image quality gap with supervised methods and achieves state-of-the-art performance on the DeepFashion dataset. Expand
AGORA: Avatars in Geography Optimized for Regression Analysis
An approach that extends the single-view SMPLify-X fitting to incorporate landmarks in multiview images and optimizes the pose θ, shape β and facial expression ψ of SMPLX to match the observed 2D landmarks by minimizing the following objective. Expand
Artificial Dummies for Urban Dataset Augmentation
An augmentation method for controlled synthesis of urban scenes containing people is described, thus producing rare or never-seen situations and the data generated by the DummyNet improve performance of several existing person detectors across various datasets as well as in challenging situations, such as night-time conditions, where only a limited amount of training data is available. Expand
Semantic Synthesis of Pedestrian Locomotion
This work reformulate pedestrian trajectory forecasting as a structured reinforcement learning (RL) problem, and proposes a hierarchical model consisting of a semantic trajectory policy network that provides a distribution over possible movements, and a human locomotion network that generates 3d human poses in each step. Expand
Pose-Forecasting Aided Human Video Prediction With Graph Convolutional Networks
A novel Graph Convolutional Network based pose predictor to comprehensively model human body joints and forcast their positions holistically, and also a stacked generative model with a temporal discriminator to iteratively refine the quality of the generated videos. Expand
H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
  • Hongyi Xu, Thiemo Alldieck, Cristian Sminchisescu
  • Computer Science
  • 2021
We present H-NeRF, neural radiance fields for rendering and temporal (4D) reconstruction of a human in motion as captured by a sparse set of cameras or even from a monocular video. Our NeRF-inspiredExpand
Motion-supervised Co-Part Segmentation
This work proposes a self-supervised deep learning method for co-part segmentation that develops the idea that motion information inferred from videos can be leveraged to discover meaningful object parts. Expand
Novel-View Human Action Synthesis
This work presents a novel 3D reasoning to synthesize the target viewpoint of the target body, and introduces a context-based generator to learn how to correct and complete the residual appearance information. Expand


Geometric Image Synthesis
This work proposes a trainable, geometry-aware image generation method that leverages various types of scene information, including geometry and segmentation, to create realistic looking natural images that match the desired scene structure. Expand
Human Appearance Transfer
The proposed architecture can be profiled to automatically generate images of a person dressed with different clothing transferred from a person in another image, opening paths for applications in entertainment and photo-editing, or affordable online shopping of clothing. Expand
Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints
This paper leverage state-of-the-art deep multi-task neural networks and parametric human and scene modeling, towards a fully automatic monocular visual sensing system for multiple interacting people, which infers the 2d and 3d pose and shape of multiple people from a single image. Expand
Unsupervised Person Image Synthesis in Arbitrary Poses
A novel approach for synthesizing photorealistic images of people in arbitrary poses using generative adversarial learning, which considers a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Expand
Synthesizing Images of Humans in Unseen Poses
A modular generative neural network is presented that synthesizes unseen poses using training pairs of images and poses taken from human action videos, separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. Expand
A Generative Model of People in Clothing
The first image-based generative model of people in clothing for the full body is presented, which sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people and is learned from a large image database. Expand
Dense Intrinsic Appearance Flow for Human Pose Transfer
We present a novel approach for the task of human pose transfer, which aims at synthesizing a new image of a person from an input image of that person and a target pose. We address the issues ofExpand
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis
Human perceptual studies and quantitative evaluations demonstrate the superiority of the Warping-GAN that significantly outperforms all existing methods on two large datasets and is light-weight and flexible enough to be injected into any networks. Expand
Dense Pose Transfer
This work proposes a combination of surface-based pose estimation and deep generative models that allows us to perform accurate pose transfer, i.e. synthesize a new image of a person based on a single image of that person and theimage of a pose donor. Expand
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing. Expand