• Corpus ID: 233181770

On tuning consistent annealed sampling for denoising score matching

@article{Serr2021OnTC,
  title={On tuning consistent annealed sampling for denoising score matching},
  author={Joan Serr{\`a} and Santiago Pascual and Jordi Pons},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.03725}
}
Score-based generative models provide state-of-the-art quality for image and audio synthesis. Sampling from these models is performed iteratively, typically employing a discretized series of noise levels and a predefined scheme. In this note, we first overview three common sampling schemes for models trained with denoising score matching. Next, we focus on one of them, consistent annealed sampling, and study its hyper-parameter boundaries. We then highlight a possible formulation of such hyper… 

Figures from this paper

Distribution Preserving Source Separation With Time Frequency Predictive Models

This work provides an example of a distribution preserving source separation method, which aims at addressing perceptual shortcomings of state-of-the-art methods by means of mix-consistent sampling from a distribution conditioned on a realization of a mix.

Full-band General Audio Synthesis with Score-based Diffusion

This work proposes a diffusion-based generative model for general audio synthesis, named DAG, which deals with full-band signals end-to-end in the waveform domain and believes DAG is flexible enough to accommodate different conditioning schemas while providing good quality synthesis.

Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models

This work explores MCMC sampling algorithms that operate at a single noise level, yet synthesize images with acceptable sample quality, and begins to approach competitive sample quality without using scores at large noise levels.

Universal Speech Enhancement with Score-based Diffusion

This work proposes to consider the task of speech enhancement as a holistic endeavor, and presents a universal speech enhancement system that tackles 55 different distortions at the same time, using a generative model that employs score-based diffusion and a multi-resolution conditioning network that performs enhancement with mixture density networks.

Adversarial score matching and improved sampling for image generation

This work proposes two improvements to DSM-ALS: 1) Consistent Annealed Sampling as a more stable alternative to Annealed Langevin Sampling, and 2) a hybrid training formulation, composed of both Denoising Score Matching and adversarial objectives.

Improved Techniques for Training Score-Based Generative Models

This work provides a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets.

Generative Modeling by Estimating Gradients of the Data Distribution

A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons.

How to Train Your Energy-Based Models

This tutorial starts by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching and Noise Constrastive Estimation, to highlight theoretical connections among these three approaches.

Score-Based Generative Modeling through Stochastic Differential Equations

This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise.

SESQA: Semi-Supervised Learning for Speech Quality Assessment

This work tackles automatic speech quality assessment with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks.

DiffWave: A Versatile Diffusion Model for Audio Synthesis

DiffWave significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.

WaveGrad: Estimating Gradients for Waveform Generation

WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.

Denoising Diffusion Probabilistic Models

High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.

Estimation of Non-Normalized Statistical Models by Score Matching

While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, it is proved a surprising result that gives a simple formula that simplifies to a sample average of a sum of some derivatives of the log- density given by the model.