• Corpus ID: 232478933

Text to Image Generation with Semantic-Spatial Aware GAN

  title={Text to Image Generation with Semantic-Spatial Aware GAN},
  author={Kaiqin Hu and Wentong Liao and Michael Ying Yang and Bodo Rosenhahn},
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions. Built upon the recent advances in generative adversarial networks (GANs), existing T2I models have made great progress. However, a close in-spection of their generated images reveals two major limitations: (1) The condition batch normalization methods are applied on the whole image feature maps equally, ignor-ing the local semantics; (2) The text encoder… 
You can try without visiting: a comprehensive survey on virtually try-on outfits
This study summarizes state-of-the-art image based virtual try-on for both fashion detection and fashion synthesis as well as their respective advantages, drawbacks, and guidelines for selection of specifictry-on model followed by its recent development and successful application.
SketchBird: Learning to Generate Bird Sketches from Text
A novel Generative Adversarial Network (GAN) based model is proposed by leveraging a Conditional Layer-Instance Normalization (CLIN) module, which can fuse the image features and sentence vector effectively and guide the sketch generation process.


StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images.
Attngan: Finegrained text to image generation with attentional generative adversarial networks
  • In CVPR,
  • 2018
DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
A Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence- level, word-level, and aspect-level is proposed and developed, inspired by human learning behaviors.
DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis
A novel simplified text-to-image backbone which is able to synthesize high-quality images directly by one pair of generator and discriminator, a novel regularization method called Matching-Aware zero-centered Gradient Penalty and a novel fusion module which can exploit the semantics of text descriptions effectively and fuse text and image features deeply during the generation process.
Controllable Text-to-Image Generation
A novel controllable text-to-image generative adversarial network (ControlGAN) is proposed, which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
Semantics Disentangling for Text-To-Image Generation
A novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high- level semantic consistency and low-level semantic diversity and a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low- level semantics.
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis
  • Minfeng Zhu, P. Pan, Wei Chen, Yi Yang
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
The proposed DM-GAN model introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated, and performs favorably against the state-of-the-art approaches.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
This work proposes a novel hierarchical approach for text-to-image synthesis by inferring semantic layout and shows that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.