Closed-Form Factorization of Latent Semantics in GANs

@article{Shen2021ClosedFormFO,
  title={Closed-Form Factorization of Latent Semantics in GANs},
  author={Yujun Shen and Bolei Zhou},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={1532-1540}
}
  • Yujun Shen, Bolei Zhou
  • Published 13 July 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In order to identify such latent dimensions for image editing, previous methods typically annotate a collection of synthesized samples and train linear classifiers in the latent space. However, they require a clear definition of the target attribute as well as the corresponding manual annotations, limiting their applications in… 

Figures and Tables from this paper

Unsupervised Discovery, Control, and Disentanglement of Semantic Attributes With Applications to Anomaly Detection
TLDR
It is demonstrated that maximizing semantic attribute control encourages disentanglement of latent factors in generative networks, and has potential applications in addressing other important problems in computer vision, such as bias and privacy in AI.
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
TLDR
The latent style space of Style-GAN2, a state-of-the-art architecture for image generation, is explored and StyleSpace, the space of channel-wise style parameters, is shown to be significantly more disentangled than the other intermediate latent spaces explored by previous works.
Unsupervised Image-to-Image Translation via Pre-Trained StyleGAN2 Network
TLDR
Both qualitative and quantitative evaluations were conducted to verify that the proposed I2I translation method can achieve better performance in terms of image quality, diversity and semantic similarity to the input and reference images compared to state-of-the-art works.
Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?
TLDR
This work proposes 'Style Intervention', a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives and verifies the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs
TLDR
A novel submodular framework is designed that takes advantage of the latent space of channel-wise style parameters, so-called stylespace, in which it cluster channels that perform similar manipulations into groups and promotes diversity by using the notion of clusters.
OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs
TLDR
A novel algorithm is presented which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture by performing independent component analysis on the pre-trained weight values of the generator.
Disentangled Representations from Non-Disentangled Models
TLDR
This paper proposes to extract disentangled representations from the state-ofthe-art generative models trained without disentangling terms in their objectives, and employs little or no hyperparameters when learning representations while achieving results on par with existing state of theart models.
Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs
TLDR
This work presents the first attempt to directly mine 3D geometric clues from an off-the-shelf 2D GAN that is trained on RGB images only and finds that such a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner.
LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions
TLDR
This work proposes a contrastive learning-based approach to discover semantic directions in the latent space of pre-trained GANs in a self-supervised manner that finds semantically meaningful dimensions compatible with state-of-the-art methods.
Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN
TLDR
The proposed Multi-attribute Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing images from a trichotomy of attributes, is designed by extending the state-of-the-art StyleGAN2, using a new conditioning technique that guides the intermediate feature maps to learn multi-scale multi-attribute entangled representations of controlling attributes.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Interpreting the Latent Space of GANs for Semantic Face Editing
TLDR
This work proposes a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs, and finds that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations.
Controlling generative models with continuous factors of variations
TLDR
This paper proposes a new method to find meaningful directions in the latent space of any generative model along which one can move to control precisely specific properties of the generated image like the position or scale of the object in the image.
In-Domain GAN Inversion for Real Image Editing
TLDR
An in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing, which achieves satisfying real image reconstruction and facilitates various image editing tasks, significantly outperforming start-of-the-arts.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial
A Style-Based Generator Architecture for Generative Adversarial Networks
TLDR
An alternative generator architecture for generative adversarial networks is proposed, borrowing from style transfer literature, that improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
TLDR
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories.
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
TLDR
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.
On the "steerability" of generative adversarial networks
TLDR
It is shown that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold, and it is hypothesized that the degree of distributional shift is related to the breadth of the training data distribution.
Image Processing Using Multi-Code GAN Prior
TLDR
A novel approach is proposed, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks, by employing multiple latent codes to generate multiple feature maps at some intermediate layer of the generator and composing them with adaptive channel importance to recover the input image.
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
TLDR
This work presents an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level, and provides open source interpretation tools to help researchers and practitioners better understand their GAN models.
...
1
2
3
4
...