StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation
@article{Kocasari2021StyleMCMB, title={StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation}, author={Umut Kocasari and Alara Dirik and Mert Tiftikci and Pinar Yanardag}, journal={2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year={2021}, pages={3441-3450} }
Discovering meaningful directions in the latent space of GANs to manipulate semantic attributes typically requires large amounts of labeled data. Recent work aims to overcome this limitation by leveraging the power of Contrastive Language-Image Pre-training (CLIP), a joint text-image model. While promising, these methods require several hours of preprocessing or training to achieve the desired manipulations. In this paper, we present StyleMC, a fast and efficient method for text-driven image…
Figures and Tables from this paper
16 Citations
Bridging CLIP and StyleGAN through Latent Alignment for Image Editing
- Computer ScienceArXiv
- 2022
This paper achieves inference-time optimization-free diverse manipulation direction mining by bridging CLIP and StyleGAN through Latent Alignment (CSLA) and can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
clip2latent: Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP
- Computer ScienceArXiv
- 2022
We introduce a new method to efficiently create text-to-image models from a pretrained CLIP and StyleGAN. It enables text driven sampling with an existing generative model without any external data or…
Rank in Style: A Ranking-based Approach to Find Interpretable Directions
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2022
A method for automatically determining the most successful and relevant text-based edits using a pre-trained StyleGAN model and a ranking method that identifies the most relevant and successful edits based on a list of keywords is proposed.
StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
- Computer ScienceSIGGRAPH
- 2022
The final model, StyleGAN-XL, sets a new state-of-the-art on large-scale image synthesis and is the first to generate images at a resolution of 10242 at such a dataset scale.
CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Image Manipulation
- Computer ScienceArXiv
- 2022
This paper introduces CLIP projection-augmentation embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation and quantitatively and qualitatively demonstrates that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy.
Text and Image Guided 3D Avatar Generation and Manipulation
- Computer ScienceArXiv
- 2022
This work proposes a novel 3D manipulation method that can manipulate both the shape and texture of the model using text or image-based prompts such as ‘a young face’ or ’a surprised face�’, and leverage the power of Contrastive Language-Image Pre-training (CLIP) model and a pre-trained 3D GAN model designed to generate face avatars to manipulate meshes.
Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance
- Computer ScienceECCV
- 2022
This work proposes a conditional classifier-free guidance scheme to better guide the diffusion process along the direction from the referring expression to the target prompt, and shows that the proposed framework can serve as a simple but strong baseline for referring object manipulation.
StyleGAN-Human: A Data-Centric Odyssey of Human Generation
- Computer ScienceECCV
- 2022
This work takes a data-centric perspective and investigates multiple critical aspects in “data engineering”, which it believes would complement the current practice and improve the generation quality with rare face poses compared to the long-tailed counterpart.
PaintInStyle: One-Shot Discovery of Interpretable Directions by Painting
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2022
This work proposes a framework that finds a specific manipulation direction using only a single simple sketch drawn on an image and performs image manipulations comparable with state-of-the-art methods.
ClipFace: Text-guided Editing of Textured 3D Morphable Models
- Computer ScienceArXiv
- 2022
A neural network is proposed that predicts both texture and expression latent codes of the morphable model of faces, to enable high-quality texture generation for 3D faces by adversarial self-supervised training, guided by differentiable rendering against collections of real RGB images.
References
SHOWING 1-10 OF 39 REFERENCES
TediGAN: Text-Guided Diverse Image Generation and Manipulation
- Computer ScienceArXiv
- 2020
This work proposes TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions, and proposes the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions.
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work explores leveraging the power of recently introduced Contrastive Language-Image Pre-training (CLIP) models in order to develop a text-based interface for StyleGAN image manipulation that does not require such manual effort.
Analyzing and Improving the Image Quality of StyleGAN
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.
ManiGAN: Text-Guided Image Manipulation
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
A novel generative adversarial network (ManiGAN), which contains two key components: text-image affine combination module (ACM) and detail correction module (DCM), which selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation.
Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
- Computer ScienceNeurIPS
- 2020
A new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word- level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text.
Designing an encoder for StyleGAN image manipulation
- Computer ScienceACM Transactions on Graphics
- 2021
This paper carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator, and suggests two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on.
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are…
Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language
- Computer ScienceNeurIPS
- 2018
The text-adaptive generative adversarial network (TAGAN) is proposed to generate semantically manipulated images while preserving text-irrelevant contents of the original image.
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
The latent style space of Style-GAN2, a state-of-the-art architecture for image generation, is explored and StyleSpace, the space of channel-wise style parameters, is shown to be significantly more disentangled than the other intermediate latent spaces explored by previous works.
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.