Closed-Form Factorization of Latent Semantics in GANs

@article{Shen2021ClosedFormFO,
  title={Closed-Form Factorization of Latent Semantics in GANs},
  author={Yujun Shen and Bolei Zhou},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={1532-1540}
}
  • Yujun Shen, Bolei Zhou
  • Published 13 July 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In order to identify such latent dimensions for image editing, previous methods typically annotate a collection of synthesized samples and train linear classifiers in the latent space. However, they require a clear definition of the target attribute as well as the corresponding manual annotations, limiting their applications in… 

Figures and Tables from this paper

Unsupervised Discovery, Control, and Disentanglement of Semantic Attributes With Applications to Anomaly Detection
TLDR
It is demonstrated that maximizing semantic attribute control encourages disentanglement of latent factors in generative networks, and has potential applications in addressing other important problems in computer vision, such as bias and privacy in AI.
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
TLDR
The latent style space of Style-GAN2, a state-of-the-art architecture for image generation, is explored and StyleSpace, the space of channel-wise style parameters, is shown to be significantly more disentangled than the other intermediate latent spaces explored by previous works.
Unsupervised Image-to-Image Translation via Pre-Trained StyleGAN2 Network
TLDR
Both qualitative and quantitative evaluations were conducted to verify that the proposed I2I translation method can achieve better performance in terms of image quality, diversity and semantic similarity to the input and reference images compared to state-of-the-art works.
Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?
TLDR
This work proposes 'Style Intervention', a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives and verifies the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
Enhanced 3DMM Attribute Control via Synthetic Dataset Creation Pipeline
TLDR
A novel pipeline for generating paired 3D faces by harnessing the power of GANs is designed and an enhanced non-linear 3D conditional attribute controller is proposed that increases the precision and diversity of 3D attribute control compared to existing methods.
Lifting 2D StyleGAN for 3D-Aware Face Generation
TLDR
Qualitative and quantitative results show the superiority of the approach over existing methods on 3D-controllable GANs in content controllability while generating realistic high quality images.
Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs
TLDR
This work presents the first attempt to directly mine 3D geometric clues from an off-the-shelf 2D GAN that is trained on RGB images only and finds that such a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner.
Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps
TLDR
This work introduces a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains and proposes Polymorphic-GAN which learns shared features across all domains and a per-domain morph layer to morph shared features according to each domain.
Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs
TLDR
A novel submodular framework is designed that takes advantage of the latent space of channel-wise style parameters, so-called stylespace, in which it cluster channels that perform similar manipulations into groups and promotes diversity by using the notion of clusters.
OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs
TLDR
A novel algorithm is presented which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture by performing independent component analysis on the pre-trained weight values of the generator.
...
...

References

SHOWING 1-10 OF 34 REFERENCES
Interpreting the Latent Space of GANs for Semantic Face Editing
TLDR
This work proposes a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs, and finds that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations.
Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
TLDR
This paper introduces an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN model by a simple model-agnostic procedure, and finds directions corresponding to sensible semantic manipulations without any form of (self-)supervision.
Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis
TLDR
This work shows that highly-structured semantic hierarchy emerges as variation factors from synthesizing scenes from the generative representations in state-of-the-art GAN models, like StyleGAN and BigGAN, and quantifies the causality between the activations and semantics occurring in the output image.
Controlling generative models with continuous factors of variations
TLDR
This paper proposes a new method to find meaningful directions in the latent space of any generative model along which one can move to control precisely specific properties of the generated image like the position or scale of the object in the image.
In-Domain GAN Inversion for Real Image Editing
TLDR
An in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing, which achieves satisfying real image reconstruction and facilitates various image editing tasks, significantly outperforming start-of-the-arts.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial
A Style-Based Generator Architecture for Generative Adversarial Networks
TLDR
An alternative generator architecture for generative adversarial networks is proposed, borrowing from style transfer literature, that improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
TLDR
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories.
GANSpace: Discovering Interpretable GAN Controls
TLDR
This paper describes a simple technique to analyze Generative Adversarial Networks and create interpretable controls for image synthesis, and shows that BigGAN can be controlled with layer-wise inputs in a StyleGAN-like manner.
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
TLDR
Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.
...
...