• Corpus ID: 234357997

Diffusion Models Beat GANs on Image Synthesis

  title={Diffusion Models Beat GANs on Image Synthesis},
  author={Prafulla Dhariwal and Alex Nichol},
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128 × 128, 4.59 on ImageNet… 

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

This work proposes a VQ-VAE architecture model with a DiVAE to work as the reconstructing component in image synthesis and explores how to input image embedding into diffusion model for excellent performance and finds that simple modification on diffusion’s UNet can achieve it.

Palette: Image-to-Image Diffusion Models

A unified framework for image-to-image translation based on conditional diffusion models is developed and it is shown that a generalist, multi-task diffusion model performs as well or better than task-specific specialist counterparts.

High-Resolution Image Synthesis with Latent Diffusion Models

These latent diffusion models achieve new state of the art scores for image inpainting and class-conditional image synthesis and highly competitive performance on various tasks, including unconditional image generation, text-to-image synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.

Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis

This work proposes a novel generative process that synthesizes images in a coarse-to-fine manner and proposes a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds.

Improved Vector Quantized Diffusion Models

A high-quality inference strategy to alleviate the joint distribution issue in VQ-Diffusion is presented and a more general and effective implementation of classi-free guidance sampling for discrete denoising diffusion model is proposed.

Cascaded Diffusion Models for High Fidelity Image Generation

It is shown that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping to train cascading pipelines achieving FID scores of 1.48 at 64×64, 3.52 at 128×128 and 4.88 at 256×256 resolutions, outperforming BigGAN-deep.

Semantic Image Synthesis via Diffusion Models

This paper proposes a novel framework based on Denoising Diffusion Probabilistic Models for semantic image synthesis, and introduces the classifier-free guidance sampling strategy, which acknowledge the scores of an unconditional model for sampling process.

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

In experiments on the highly challenging and diverse ImageNet dataset, the scheme introduces significantly more intelligible intermediate gradients, better alignment with theoretical findings, as well as im- proved generation results under several evaluation metrics.

Dynamic Dual-Output Diffusion Models

  • Yaniv BennyL. Wolf
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
Some of the causes that affect the generation quality of diffusion models, especially when sampling with few iterations, are revealed, and a simple, yet effective, solution to mitigate them is revealed.

On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models

This work explores techniques to condition diffusion models with carefully crafted input noise artifacts that allows generation of images conditioned on semantic attributes, different from existing approaches that input Gaussian noise and further introduce conditioning at the diffusion model’s inference step.



Large Scale GAN Training for High Fidelity Natural Image Synthesis

It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input.

Generating Diverse High-Fidelity Images with VQ-VAE-2

It is demonstrated that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.

Image Super-Resolution via Iterative Refinement

The effectiveness of SR3 is shown in cascaded image generation, where a generative model is chained with super-resolution models to synthesize high-resolution images with competitive FID scores on the class-conditional 256×256 ImageNet generation challenge.

Analyzing and Improving the Image Quality of StyleGAN

This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.

High-Fidelity Image Generation With Fewer Labels

This work demonstrates how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting.

Denoising Diffusion Implicit Models

Denoising diffusion implicit models (DDIMs) are presented, a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs that can produce high quality samples faster and perform semantically meaningful image interpolation directly in the latent space.

Taming Transformers for High-Resolution Image Synthesis

It is demonstrated how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

This work presents a hierarchical VAE that, for the first time, outperforms the PixelCNN in log-likelihood on all natural image benchmarks and visualize the generative process and show the VAEs learn efficient hierarchical visual representations.

Hierarchical Autoregressive Image Models with Auxiliary Decoders

It is shown that autoregressive models conditioned on discrete representations of images which abstract away local detail can produce high-fidelity reconstructions of images, and that they can be trained on to produce samples with large-scale coherence.

Generating Images with Sparse Representations

This work proposes a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences, and which scales effectively to high resolution images.