Aligning Latent and Image Spaces to Connect the Unconnectable

@article{Skorokhodov2021AligningLA,
  title={Aligning Latent and Image Spaces to Connect the Unconnectable},
  author={Ivan Skorokhodov and Grigorii Sotnikov and Mohamed Elhoseiny},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021},
  pages={14124-14133}
}
In this work, we develop a method to generate infinite high-resolution images with diverse and complex content. It is based on a perfectly equivariant patch-wise generator with synchronous interpolations in the image and latent spaces. Latent codes, when sampled, are positioned on the coordinate grid, and each pixel is computed from an interpolation of the neighboring codes. We modify the AdaIN mechanism to work in such a setup and train a GAN model to generate images positioned between any two… 
OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks
TLDR
The proposed THE AUTHORS'-GAN is the first one-shot ultra-high-resolution (UHR) image synthesis framework that generates non-repetitive images with 4K or higher resolution from a single training image and improves visual coherence maintaining diversity by adding vertical positional embeddings to the feature maps.
Third Time's the Charm? Image and Video Editing with StyleGAN3
TLDR
This work demonstrates that while StyleGAN3 can be trained on unaligned data, one can still use aligned data for training, without hindering the ability to generate unaligned imagery, and proposes an encoding scheme trained solely on aligned data, yet can still invert unaligned images.
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
TLDR
A completion method using a transformer for scene modeling and novel methods to improve the properties of a 360-degree image on the output image, which outperforms state-of-the-art (SOTA) methods both qualitatively and quantitatively.
StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN
TLDR
This work shows that with a pretrained StyleGAN along with some operations, without any additional architecture, it can perform comparably to the state-ofthe-art methods on various tasks, including image blending, panorama generation, generation from a single image, controllable and local multimodal image to image translation, and attributes transfer.
Sound-Guided Semantic Video Generation
TLDR
This paper proposes a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space and provides the new high-resolution landscape video dataset (audio-visual pair) for the sound-guided video generation task.
Arbitrary-Scale Image Synthesis
TLDR
This work proposes the design of scale-consistent positional encodings invariant to the generator’s layers transformations that enables the generation of arbitraryscale images even at scales unseen during training.
InfinityGAN: Towards Infinite-Pixel Image Synthesis
TLDR
Experimental evaluation validates that InfinityGAN generates images with superior realism compared to baselines and features parallelizable inference, and several applications unlocked by the approach are shown, such as spatial style fusion, multimodal outpainting, and image inbetweening.
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
TLDR
This work rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames’ features, which decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024 2 videos for the first time.

References

SHOWING 1-10 OF 79 REFERENCES
Interpreting Spatially Infinite Generative Models
TLDR
This paper provides a firm theoretical interpretation for infinite spatial generation, by drawing connections to spatial stochastic processes, and uses the resulting intuition to improve upon existing spatially infinite generative models to enable more efficient training through a model that is called an infiniteGenerative adversarial network, or $\infty$-GAN.
COCO-GAN: Generation by Parts via Conditional Coordinating
TLDR
COnditional COordinate GAN (COCO-GAN) of which the generator generates images by parts based on their spatial coordinates as the condition and the discriminator learns to justify realism across multiple assembled patches by global coherence, local appearance, and edge-crossing continuity is proposed.
Few-Shot Unsupervised Image-to-Image Translation
TLDR
This model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design, and verifies the effectiveness of the proposed framework through extensive experimental validation and comparisons to several baseline methods on benchmark datasets.
SWAGAN: A Style-based Wavelet-driven Generative Model
TLDR
A novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain that retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks and induces improved downstream visual quality.
Large Scale GAN Training for High Fidelity Natural Image Synthesis
TLDR
It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input.
Spiral Generative Network for Image Extrapolation
TLDR
A novel Spiral Generative Network, SpiralNet, to perform image extrapolation in a spiral manner, which regards extrapolation as an evolution process growing from an input sub-image along a spiral curve to an expanded full image.
Analyzing and Improving the Image Quality of StyleGAN
TLDR
This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.
Positional Encoding as Spatial Inductive Bias in GANs
TLDR
This work shows that SinGAN's impressive capability in learning internal patch distribution, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators, and proposes a new multi-scale training strategy and demonstrates its effectiveness in the state-of-the-art unconditional generator StyleGAN2.
Learning Texture Manifolds with the Periodic Spatial GAN
TLDR
It is shown that the image generation with PSGANs has properties of a texture manifold: it can smoothly interpolate between samples in the structured noise space and generate novel samples, which lie perceptually between the textures of the original dataset.
SinGAN: Learning a Generative Model From a Single Natural Image
We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is
...
1
2
3
4
5
...