• Corpus ID: 211096552

Image-to-Image Translation with Text Guidance

@article{Li2020ImagetoImageTW,
  title={Image-to-Image Translation with Text Guidance},
  author={Bowen Li and Xiaojuan Qi and Philip H. S. Torr and Thomas Lukasiewicz},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.05235}
}
The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image… 

Figures and Tables from this paper

Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

TLDR
A new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word- level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text.

DT2I: Dense Text-to-Image Generation from Region Descriptions

TLDR
D dense text-to-image (DT2I) synthesis is introduced as a new task to pave the way toward more intuitive image generation and DTC-GAN, a novel method to generate images from semantically rich region descriptions, and a multi-modal region feature matching loss to encourage semantic image-text matching.

Semantic Text-to-Face GAN -ST^2FG

TLDR
This paper proposes a novel method for semantic textto-face generation that outperforms the current state-of-the-art methods like ManiGAN and TediGAN across four different metrics when tested on the benchmarked Multi-Modal CelebA-HQ.

Word-Level Fine-Grained Story Visualization

TLDR
This work introduces a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem, and proposes a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency.

Scene Generated with Text Guidance (VAAB System)

TLDR
The proposed system is one of the applications of the Refined Novel Generative Adversarial Network, which could bring a revolutionary change in the teaching-learning process.

Lightweight Long-Range Generative Adversarial Networks

TLDR
Novel lightweight generative adversarial networks are introduced, which can effectively capture long-range dependencies in the image generation process, and produce high-quality results with a much simpler architecture.

References

SHOWING 1-10 OF 30 REFERENCES

ManiGAN: Text-Guided Image Manipulation

TLDR
A novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM), which selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation.

Semantic Image Synthesis via Adversarial Learning

TLDR
An end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements of being realistic while matching the target text description.

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

TLDR
The text-adaptive generative adversarial network (TAGAN) is proposed to generate semantically manipulated images while preserving text-irrelevant contents of the original image.

Image-to-Image Translation with Conditional Adversarial Networks

TLDR
Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

Object-Driven Text-To-Image Synthesis via Adversarial Training

TLDR
A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

TLDR
A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

Generative Adversarial Text to Image Synthesis

TLDR
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels.

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks

TLDR
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.

InstaGAN: Instance-aware Image-to-Image Translation

TLDR
A novel method is proposed, coined instance-aware GAN (InstaGAN), that incorporates the instance information and improves multi-instance transfiguration and introduces a context preserving loss that encourages the network to learn the identity function outside of target instances.

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

TLDR
An Attentional Generative Adversarial Network that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation and for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.