• Corpus ID: 232269768

Using latent space regression to analyze and leverage compositionality in GANs

@article{Chai2021UsingLS,
  title={Using latent space regression to analyze and leverage compositionality in GANs},
  author={Lucy Chai and Jonas Wulff and Phillip Isola},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.10426}
}
In recent years, Generative Adversarial Networks have become ubiquitous in both research and public perception, but how GANs convert an unstructured latent code to a high quality output is still an open question. In this work, we investigate regression into the latent space as a probe to understand the compositional properties of GANs. We find that combining the regressor and a pretrained generator provides a strong image prior, allowing us to create composite images from a collage of random… 
Ensembling with Deep Generative Views
TLDR
This work uses StyleGAN2 as the source of generative augmentations and investigates whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement
TLDR
This work presents a novel inversion scheme that extends current encoder-based inversion methods by introducing an iterative refinement mechanism, and presents a residual-based encoder, named ReStyle, which attains improved accuracy compared to current state-of-the-art encoders with a negligible increase in inference time.
Towards Open-World Text-Guided Face Image Generation and Manipulation
TLDR
This work proposes a unified framework for both face image generation and manipulation that produces diverse and high-quality images with an unprecedented resolution at 1024 from multimodal inputs and supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
Ensembling with Deep Generative Views Supplementary Material
TLDR
In supplementary material, qualitative examples of the GAN reconstructions and the perturbation methods investigated in the main text are shown, at both fine and coarse layers of the latent code.
StyleFusion: Disentangling Spatial Segments in StyleGAN-Generated Images
TLDR
StyleFusion, a new mapping architecture for StyleGAN, which takes as input a number of latent codes and fuses them into a single style code, results in a single harmonized image in which each semantic region is controlled by one of the input latent codes.
MixSyn: Learning Composition and Style for Multi-Source Image Synthesis
TLDR
This work proposes MixSyn (read as “mixin’́’) for learning novel fuzzy compositions from multiple sources and creating novel images as a mix of image regions corresponding to the compositions, which combines uncorrelated regions from multiple source masks into a coherent semantic composition.
Hallucinating Pose-Compatible Scenes
TLDR
This work presents a large-scale generative adversarial network for pose-conditioned scene generation, significantly scale the size and complexity of training data, and designs a pose conditioning mechanism that drives the model to learn the nuanced relationship between pose and scene.
Temporally Consistent Semantic Video Editing
TLDR
This work presents a simple yet effective method to facilitate temporally coherent video editing by minimizing the temporal photometric inconsistency by optimizing both the latent code and the pre-trained generator.
Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
Existing GAN inversion and editing methods work well for aligned objects with a clean background, such as portraits and animal faces, but often struggle for more difficult categories with complex
Spatially-Adaptive Multilayer GAN Inversion and Editing
TLDR
This work proposes a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2, to explore inversion with a collection of layers, spatially adapting the inversion process to the difficulty of the image.
...
...

References

SHOWING 1-10 OF 63 REFERENCES
Interpreting the Latent Space of GANs for Semantic Face Editing
TLDR
This work proposes a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs, and finds that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations.
Exploiting GAN Internal Capacity for High-Quality Reconstruction of Natural Images
TLDR
This work proposes to exploit the representation in intermediate layers of the GAN generator, and shows that this leads to increased capacity and preliminary results in exploiting the learned representation in the attention map of the generator to obtain an unsupervised segmentation of natural images.
Seeing What a GAN Cannot Generate
TLDR
This work visualize mode collapse at both the distribution level and the instance level, and deploys a semantic segmentation network to compare the distribution of segmented objects in the generated images with the target distribution in the training set.
Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space
TLDR
This work shows that, under a simple nonlinear operation, the data distribution can be modeled as Gaussian and therefore expressed using sufficient statistics and yields a simple Gaussian prior, which is used to regularize the projection of images into the latent space.
In-Domain GAN Inversion for Real Image Editing
TLDR
An in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing, which achieves satisfying real image reconstruction and facilitates various image editing tasks, significantly outperforming start-of-the-arts.
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
TLDR
This work presents an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level, and provides open source interpretation tools to help researchers and practitioners better understand their GAN models.
A Style-Based Generator Architecture for Generative Adversarial Networks
TLDR
An alternative generator architecture for generative adversarial networks is proposed, borrowing from style transfer literature, that improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.
On the "steerability" of generative adversarial networks
TLDR
It is shown that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold, and it is hypothesized that the degree of distributional shift is related to the breadth of the training data distribution.
Image Processing Using Multi-Code GAN Prior
TLDR
A novel approach is proposed, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks, by employing multiple latent codes to generate multiple feature maps at some intermediate layer of the generator and composing them with adaptive channel importance to recover the input image.
Adversarial Latent Autoencoders
Autoencoder networks are unsupervised approaches aiming at combining generative and representational properties by learning simultaneously an encoder-generator map. Although studied extensively, the
...
...