ManiGAN: Text-Guided Image Manipulation

@article{Li2020ManiGANTI,
  title={ManiGAN: Text-Guided Image Manipulation},
  author={Bowen Li and Xiaojuan Qi and Thomas Lukasiewicz and Philip H. S. Torr},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={7877-7886}
}
The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. To achieve this, we propose a novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM). The ACM selects image regions relevant to the given text and then correlates the… 

Figures and Tables from this paper

DE-Net: Dynamic Text-guided Image Editing Adversarial Networks
TLDR
A Dynamic Editing Block (DEBlock) which combines spatial- and channel-wise manipulations dynamically for various editing requirements and a Combination Weights Predictor (CWP) which predicts the combination weights for DEBlock according to the inference on text and visual features.
Image-to-Image Translation with Text Guidance
The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to
TediGAN: Text-Guided Diverse Image Generation and Manipulation
TLDR
This work proposes TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions, and proposes the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions.
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
TLDR
This work proposes TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions using a control mechanism based on style-mixing, and proposes the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions.
Semantic Text-to-Face GAN -ST^2FG
TLDR
This paper proposes a novel method for semantic textto-face generation that outperforms the current state-of-the-art methods like ManiGAN and TediGAN across four different metrics when tested on the benchmarked Multi-Modal CelebA-HQ.
Embedding Arithmetic for Text-driven Image Transformation
TLDR
The SIMAT dataset is introduced to show that vanilla CLIP multimodal embeddings are not very well suited for text-driven image transformation, but that a simple finetuning on the COCO dataset can bring dramatic improvements.
Text2LIVE: Text-Driven Layered Image and Video Editing
TLDR
The key idea is to generate an edit layer (color+opacity) that is composited over the original input that allows us to constrain the generation process and maintain high fidelity to the originalinput via novel text-driven losses that are applied directly to the edit layer.
Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
TLDR
A new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word- level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text.
Text as Neural Operator:Image Manipulation by Text Instruction
TLDR
This paper proposes a GAN-based method to treat text as neural operators to locally modify the image feature and shows that the proposed model generates images of greater fidelity and semantic relevance, and when used as a image query, leads to better retrieval performance.
Adversarial Text-to-Image Synthesis: A Review
...
...

References

SHOWING 1-10 OF 56 REFERENCES
Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language
TLDR
The text-adaptive generative adversarial network (TAGAN) is proposed to generate semantically manipulated images while preserving text-irrelevant contents of the original image.
Semantic Image Synthesis via Adversarial Learning
TLDR
An end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements of being realistic while matching the target text description.
Controllable Text-to-Image Generation
TLDR
A novel controllable text-to-image generative adversarial network (ControlGAN) is proposed, which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
TLDR
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
TLDR
A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.
Fader Networks: Manipulating Images by Sliding Attributes
TLDR
A new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space is introduced, which results in much simpler training schemes and nicely scales to multiple attributes.
InstaGAN: Instance-aware Image-to-Image Translation
TLDR
A novel method is proposed, coined instance-aware GAN (InstaGAN), that incorporates the instance information and improves multi-instance transfiguration and introduces a context preserving loss that encourages the network to learn the identity function outside of target instances.
Generative Adversarial Text to Image Synthesis
TLDR
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels.
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
TLDR
An Attentional Generative Adversarial Network that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation and for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.
Generative Visual Manipulation on the Natural Image Manifold
TLDR
This paper proposes to learn the natural image manifold directly from data using a generative adversarial neural network, and defines a class of image editing operations, and constrain their output to lie on that learned manifold at all times.
...
...