LatentKeypointGAN: Controlling Images via Latent Keypoints - Extended Abstract

@article{He2022LatentKeypointGANCI,
  title={LatentKeypointGAN: Controlling Images via Latent Keypoints - Extended Abstract},
  author={Xingzhe He and Bastian Wandt and Helge Rhodin},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.03448}
}
Abstract Generative adversarial networks (GANs) can now generate photo-realistic images. However, how to best control the image content remains an open challenge. We introduce LatentKeypointGAN, a two-stage GAN internally conditioned on a set of keypoints and associated appearance embeddings providing control of the position and style of the generated objects and their respective parts. A major difficulty that we address is disentangling the image into spatial and appearance factors with little… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 19 REFERENCES

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

TLDR
StyleMapGAN is proposed: the intermediate latent space has spatial dimensions, and a spatially variant modulation replaces AdaIN that makes the embedding through an encoder more accurate than existing optimization-based methods while maintaining the properties of GANs.

Disentangled Image Generation Through Structured Noise Injection

TLDR
It is shown that disentanglement in the first layer of the generator network leads to disentangling the latent space in the generated image, and through a grid-based structure, several aspects of disentangled without complicating the network architecture and without requiring labels are achieved.

A Style-Based Generator Architecture for Generative Adversarial Networks

TLDR
An alternative generator architecture for generative adversarial networks is proposed, borrowing from style transfer literature, that improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

TLDR
A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

Analyzing and Improving the Image Quality of StyleGAN

TLDR
This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.

Unsupervised Learning of Object Landmarks through Conditional Image Generation

TLDR
This work proposes a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision and introduces a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features.

Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation

TLDR
This work presents a novel hierarchical adaptive Diagonal spatial ATtention (DAT) layers to separately manipulate the spatial contents from styles in a hierarchical manner and confirms that the proposed method not only outperforms the existing models in disentanglement scores, but also provides more flexible control over spatial features in the generated images.

Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?

We propose an efficient algorithm to embed a given image into the latent space of StyleGAN. This embedding enables semantic image editing operations that can be applied to existing photographs.

LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

TLDR
This work proposes to amplify human effort through a partially automated labeling scheme, leveraging deep learning with humans in the loop, and constructs a new image dataset, LSUN, which contains around one million labeled images for each of 10 scene categories and 20 object categories.

Unsupervised Part-Based Disentangling of Object Shape and Appearance

TLDR
This work presents an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category by simultaneously exploiting invariance and equivariance constraints between synthetically transformed images.