• Corpus ID: 246863713

Understanding DDPM Latent Codes Through Optimal Transport

  title={Understanding DDPM Latent Codes Through Optimal Transport},
  author={Valentin Khrulkov and I. Oseledets},
Diffusion models have recently outperformed alternative approaches to model the distribution of natural images, such as GANs. Such diffusion models allow for deterministic sampling via the probability flow ODE, giving rise to a latent space and an encoder map. While having important practical applications, such as estimation of the likelihood, the theoretical properties of this map are not yet fully understood. In the present work, we partially address this question for the popu-lar case of the… 

Figures from this paper

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance

An alter-native, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent spaces of these models, are provided.

Maximum Likelihood Training of Implicit Nonlinear Diffusion Models

A data-adaptive and nonlinear diffusion process for score-based diffusion models that improves the learning curve of INDM to nearly Maximum Likelihood Estimation (MLE) training, against the non-MLE training of DDPM++.

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

PriorGrad is proposed to improve the efficiency of the conditional diffusion model for speech synthesis by applying an adaptive prior derived from the data statistics based on the conditional information and achieves faster convergence and inference with superior performance, leading to an improved perceptual quality and robustness to a smaller network capacity, and thereby demonstrating the Efficiency of a data-dependent adaptive prior.

Diffusion Models in Vision: A Survey

A multi-perspective categorization of diffusion models applied in computer vision, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing models is introduced.

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions π 0 and π 1 ,

DiffEdit: Diffusion-based semantic image editing with mask guidance

DIFFEDIT, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query, achieves state-of-the-art editing performance on ImageNet.



Improved Denoising Diffusion Probabilistic Models

This work shows that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality, and finds that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality.

Denoising Diffusion Implicit Models

Denoising diffusion implicit models (DDIMs) are presented, a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs that can produce high quality samples faster and perform semantically meaningful image interpolation directly in the latent space.

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models

This work proposes Iterative Latent Variable Refinement (ILVR), a method to guide the generative process in DDPM to generate high-quality images based on a given reference image, which allows adaptation of a single DDPM without any additional learning in various image generation tasks.

Progressive Distillation for Fast Sampling of Diffusion Models

A method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps, and it is shown that the full progressive distillation procedure does not take more time than it takes to train the original model.

Denoising Diffusion Probabilistic Models

High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.

Label-Efficient Semantic Segmentation with Diffusion Models

This paper investigates the intermediate activations from the networks that perform the Markov step of the reverse diffusion process and shows that these activations effectively capture the semantic information from an input image and appear to be excellent pixel-level representations for the segmentation problem.

Generative Modeling by Estimating Gradients of the Data Distribution

A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons.

Score-Based Generative Modeling through Stochastic Differential Equations

This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise.

Diffusion Models Beat GANs on Image Synthesis

It is shown that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models, and classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256 × 256 and 3.85 on imageNet 512 × 512.