• Corpus ID: 232478933

Text to Image Generation with Semantic-Spatial Aware GAN

@article{Hu2021TextTI,
  title={Text to Image Generation with Semantic-Spatial Aware GAN},
  author={Kaiqin Hu and Wentong Liao and Michael Ying Yang and Bodo Rosenhahn},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.00567}
}
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions. Built upon the recent advances in generative adversarial networks (GANs), existing T2I models have made great progress. However, a close in-spection of their generated images reveals two major limitations: (1) The condition batch normalization methods are applied on the whole image feature maps equally, ignor-ing the local semantics; (2) The text encoder… 
You can try without visiting: a comprehensive survey on virtually try-on outfits
TLDR
This study summarizes state-of-the-art image based virtual try-on for both fashion detection and fashion synthesis as well as their respective advantages, drawbacks, and guidelines for selection of specifictry-on model followed by its recent development and successful application.
SketchBird: Learning to Generate Bird Sketches from Text
TLDR
A novel Generative Adversarial Network (GAN) based model is proposed by leveraging a Conditional Layer-Instance Normalization (CLIN) module, which can fuse the image features and sentence vector effectively and guide the sketch generation process.

References

SHOWING 1-10 OF 46 REFERENCES
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
TLDR
Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images.
Attngan: Finegrained text to image generation with attentional generative adversarial networks
  • In CVPR,
  • 2018
DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
TLDR
A Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence- level, word-level, and aspect-level is proposed and developed, inspired by human learning behaviors.
DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis
TLDR
A novel simplified text-to-image backbone which is able to synthesize high-quality images directly by one pair of generator and discriminator, a novel regularization method called Matching-Aware zero-centered Gradient Penalty and a novel fusion module which can exploit the semantics of text descriptions effectively and fuse text and image features deeply during the generation process.
Controllable Text-to-Image Generation
TLDR
A novel controllable text-to-image generative adversarial network (ControlGAN) is proposed, which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
Semantics Disentangling for Text-To-Image Generation
TLDR
A novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high- level semantic consistency and low-level semantic diversity and a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low- level semantics.
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis
  • Minfeng Zhu, P. Pan, Wei Chen, Yi Yang
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
The proposed DM-GAN model introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated, and performs favorably against the state-of-the-art approaches.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
TLDR
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
TLDR
This work proposes a novel hierarchical approach for text-to-image synthesis by inferring semantic layout and shows that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.
...
...