Grasping the Arrow of Time from the Singularity: Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN

@article{Wu2022GraspingTA,
  title={Grasping the Arrow of Time from the Singularity: Decoding Micromotion in Low-dimensional Latent Spaces from StyleGAN},
  author={Qiucheng Wu and Yi-fan Jiang and Junru Wu and Kai Wang and Gong Zhang and Humphrey Shi and Zhangyang Wang and Shiyu Chang},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.12696}
}
The disentanglement of StyleGAN latent space has paved the way for realistic and controllable image editing, but does StyleGAN know anything about temporal motion, as it was only trained on static images? To study the motion features in the latent space of StyleGAN, in this paper, we hypothesize and demonstrate that a series of meaningful, natural, and versatile small, local movements (referred to as “micromotion”, such as expression, head movement, and aging effect) can be represented in low… 

Figures from this paper

References

SHOWING 1-10 OF 39 REFERENCES
Interpreting the Latent Space of GANs for Semantic Face Editing
TLDR
This work proposes a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs, and finds that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations.
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
TLDR
This work presents a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating videos.
Only a Matter of Style: Age Transformation Using a Style-Based Regression Model
TLDR
This work presents an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN subject to a given aging shift, and employs aPre-trained age regression network to explicitly guide the encoder in generating the latent codes corresponding to the desired age.
Pivotal Tuning for Latent-based editing of Real Images
TLDR
This paper presents pivotal tuning — a brief training process that preserves editing quality, while surgically changing the portrayed identity and appearance, and shows that pivotal tuning also applies to accommodating for a multitude of faces, while introducing negligible distortion on the rest of the domain.
PIE: Portrait Image Embedding for Semantic Control
TLDR
This work presents the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image, and designs a novel hierarchical non-linear optimization problem to obtain the embedding.
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
TLDR
The latent style space of Style-GAN2, a state-of-the-art architecture for image generation, is explored and StyleSpace, the space of channel-wise style parameters, is shown to be significantly more disentangled than the other intermediate latent spaces explored by previous works.
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
TLDR
StyleMapGAN is proposed: the intermediate latent space has spatial dimensions, and a spatially variant modulation replaces AdaIN that makes the embedding through an encoder more accurate than existing optimization-based methods while maintaining the properties of GANs.
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
TLDR
This work investigates how to effectively link the pretrained latent spaces of StyleGAN and CLIP, which in turn allows us to automatically extract semantically labeled edit directions from StyleGAN, finding and naming meaningful edit operations without any additional human guidance.
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
TLDR
This work rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames’ features, which decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time.
...
...