Text to Image Generation with Semantic-Spatial Aware GAN
@article{Hu2021TextTI, title={Text to Image Generation with Semantic-Spatial Aware GAN}, author={Kaiqin Hu and Wentong Liao and Michael Ying Yang and Bodo Rosenhahn}, journal={ArXiv}, year={2021}, volume={abs/2104.00567} }
A text to image generation (T2I) model aims to generate photo-realistic images which are semantically consistent with the text descriptions. Built upon the recent advances in generative adversarial networks (GANs), existing T2I models have made great progress. However, a close in-spection of their generated images reveals two major limitations: (1) The condition batch normalization methods are applied on the whole image feature maps equally, ignor-ing the local semantics; (2) The text encoder…
Figures and Tables from this paper
2 Citations
You can try without visiting: a comprehensive survey on virtually try-on outfits
- Computer ScienceMultim. Tools Appl.
- 2022
This study summarizes state-of-the-art image based virtual try-on for both fashion detection and fashion synthesis as well as their respective advantages, drawbacks, and guidelines for selection of specifictry-on model followed by its recent development and successful application.
SketchBird: Learning to Generate Bird Sketches from Text
- Computer Science2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
- 2021
A novel Generative Adversarial Network (GAN) based model is proposed by leveraging a Conditional Layer-Instance Normalization (CLIN) module, which can fuse the image features and sentence vector effectively and guide the sketch generation process.
References
SHOWING 1-10 OF 46 REFERENCES
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019
Extensive experiments demonstrate that the proposed stacked generative adversarial networks significantly outperform other state-of-the-art methods in generating photo-realistic images.
Attngan: Finegrained text to image generation with attentional generative adversarial networks
- In CVPR,
- 2018
DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
A Dynamic Aspect-awarE GAN (DAE-GAN) that represents text information comprehensively from multiple granularities, including sentence- level, word-level, and aspect-level is proposed and developed, inspired by human learning behaviors.
DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis
- Computer ScienceArXiv
- 2020
A novel simplified text-to-image backbone which is able to synthesize high-quality images directly by one pair of generator and discriminator, a novel regularization method called Matching-Aware zero-centered Gradient Penalty and a novel fusion module which can exploit the semantics of text descriptions effectively and fuse text and image features deeply during the generation process.
Controllable Text-to-Image Generation
- Computer ScienceNeurIPS
- 2019
A novel controllable text-to-image generative adversarial network (ControlGAN) is proposed, which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
Semantics Disentangling for Text-To-Image Generation
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high- level semantic consistency and low-level semantic diversity and a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low- level semantics.
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
The proposed DM-GAN model introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated, and performs favorably against the state-of-the-art approaches.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
Microsoft COCO: Common Objects in Context
- Computer ScienceECCV
- 2014
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene…
Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes a novel hierarchical approach for text-to-image synthesis by inferring semantic layout and shows that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.