Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

  title={Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions},
  author={Xihui Liu and Zhe L. Lin and Jianming Zhang and Handong Zhao and Quan Hung Tran and Xiaogang Wang and Hongsheng Li},
We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A… Expand
Semantic Layout Manipulation with High-Resolution Sparse Attention
This work proposes a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512 and introduces a novel generator architecture consisting of a semantic encoder and a two-stage decoder for coarse-to-fine synthesis. Expand
PSCC-Net: Progressive Spatio-Channel Correlation Network for Image Manipulation Detection and Localization
A Progressive Spatio-Channel Correlation Network (PSCC-Net) to detect and localize image manipulations, which captures both spatial and channel-wise correlations in the bottom-up path, enabling the network to cope with a wide range of manipulation attacks. Expand
Language-Driven Image Style Transfer
Contrastive language visual artist (CLVA) is proposed that learns to extract visual semantics from style instructions and accomplish LDIST by the patch-wise style discriminator and compares contrastive pairs of content image and style instruction to improve the mutual relativeness between transfer results. Expand
Automatic Object Recoloring Using Adversarial Learning
This is the first algorithm where the automatic recoloring is only limited by the ability of the mask extractor to map a natural language tag to a specific object in the image (several hundred object types at the time of this writing). Expand


ManiGAN: Text-Guided Image Manipulation
A novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM), which selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation. Expand
XGAN: Unsupervised Image-to-Image Translation for many-to-many Mappings
XGAN ("Cross-GAN"), a dual adversarial autoencoder, is introduced, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. Expand
Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language
The text-adaptive generative adversarial network (TAGAN) is proposed to generate semantically manipulated images while preserving text-irrelevant contents of the original image. Expand
Semantic Image Synthesis via Adversarial Learning
An end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements of being realistic while matching the target text description. Expand
Language-Based Image Editing with Recurrent Attentive Models
A generic modeling framework for two subtasks of LBIE: language-based image segmentation and image colorization using recurrent attentive models to fuse image and language features is proposed. Expand
Language Guided Fashion Image Manipulation with Feature-wise Transformations
FiLMedGAN is proposed, which leverages feature-wise linear modulation (FiLM) to relate and transform visual features with natural language representations without using extra spatial information to generate an image that is as realistic as possible. Expand
InstaGAN: Instance-aware Image-to-Image Translation
A novel method is proposed, coined instance-aware GAN (InstaGAN), that incorporates the instance information and improves multi-instance transfiguration and introduces a context preserving loss that encourages the network to learn the identity function outside of target instances. Expand
Generative Visual Manipulation on the Natural Image Manifold
This paper proposes to learn the natural image manifold directly from data using a generative adversarial neural network, and defines a class of image editing operations, and constrain their output to lie on that learned manifold at all times. Expand
StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation
A unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network, which leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. Expand
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
This work argues that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images, and proposes a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts. Expand