Adversarial Synthesis of Human Pose from Text

@article{Zhang2020AdversarialSO,
  title={Adversarial Synthesis of Human Pose from Text},
  author={Yifei Zhang and Rania Briq and Julian Tanke and Juergen Gall},
  journal={Pattern Recognition},
  year={2020},
  volume={12544},
  pages={145 - 158}
}
This work focuses on synthesizing human poses from human-level text descriptions. We propose a model that is based on a conditional generative adversarial network. It is designed to generate 2D human poses conditioned on human-written text descriptions. The model is trained and evaluated using the COCO dataset, which consists of images capturing complex everyday scenes with various human poses. We show through qualitative and quantitative results that the model is capable of synthesizing… 
TIPS: Text-Induced Pose Synthesis
TLDR
This paper first presents the shortcomings of current pose transfer algorithms and then proposes a novel text-based pose transfer technique to address those issues, which generates promising results with significant qualitative and quantitative scores in the authors' experiments.
Towards Better Adversarial Synthesis of Human Images from Text
TLDR
This paper proposes an approach that generates multiple 3D human meshes from text based on the SMPL model and shows how using such a shape as input to image synthesis frameworks helps to constrain the network to synthesize humans with realistic human shapes.

References

SHOWING 1-10 OF 42 REFERENCES
Generative Adversarial Text to Image Synthesis
TLDR
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels.
Towards Diverse and Natural Image Descriptions via a Conditional GAN
TLDR
A new framework based on Conditional Generative Adversarial Networks (CGAN) is proposed, which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content.
Pose Guided Person Image Generation
TLDR
The novel Pose Guided Person Generation Network (PG$^2$) that allows to synthesize person images in arbitrary poses, based on an image of that person and a novel pose, is proposed.
Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis
TLDR
This paper introduces two modules, a Semantic Consistency Module (SCM) and an Attention Competition Module (ACM), to the SEGAN, a new model, Semantics-enhanced Generative Adversarial Network (SEGAN), for fine-grained text-to-image generation.
Object-Driven Text-To-Image Synthesis via Adversarial Training
TLDR
A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
TLDR
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
Convolutional Pose Machines
TLDR
This work designs a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference in structured prediction tasks such as articulated pose estimation.
MirrorGAN: Learning Text-To-Image Generation by Redescription
TLDR
Thorough experiments on two public benchmark datasets demonstrate the superiority of MirrorGAN over other representative state-of-the-art methods.
Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge
TLDR
A novel text-to-image method called LeicaGAN is proposed to combine the multiple priors learning phase as a textual-visual co-embedding (TVE) comprising a text-image encoder for learning semantic, texture, and color priors and aText-mask encoding for learning shape and layout priors.
Improved Training of Wasserstein GANs
TLDR
This work proposes an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input, which performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning.
...
...