On tuning consistent annealed sampling for denoising score matching
@article{Serr2021OnTC, title={On tuning consistent annealed sampling for denoising score matching}, author={Joan Serr{\`a} and Santiago Pascual and Jordi Pons}, journal={ArXiv}, year={2021}, volume={abs/2104.03725} }
Score-based generative models provide state-of-the-art quality for image and audio synthesis. Sampling from these models is performed iteratively, typically employing a discretized series of noise levels and a predefined scheme. In this note, we first overview three common sampling schemes for models trained with denoising score matching. Next, we focus on one of them, consistent annealed sampling, and study its hyper-parameter boundaries. We then highlight a possible formulation of such hyper…
Figures from this paper
4 Citations
Distribution Preserving Source Separation With Time Frequency Predictive Models
- Computer ScienceArXiv
- 2023
This work provides an example of a distribution preserving source separation method, which aims at addressing perceptual shortcomings of state-of-the-art methods by means of mix-consistent sampling from a distribution conditioned on a realization of a mix.
Full-band General Audio Synthesis with Score-based Diffusion
- Computer ScienceICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2023
This work proposes a diffusion-based generative model for general audio synthesis, named DAG, which deals with full-band signals end-to-end in the waveform domain and believes DAG is flexible enough to accommodate different conditioning schemas while providing good quality synthesis.
Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models
- Computer Science
- 2022
This work explores MCMC sampling algorithms that operate at a single noise level, yet synthesize images with acceptable sample quality, and begins to approach competitive sample quality without using scores at large noise levels.
Universal Speech Enhancement with Score-based Diffusion
- Computer ScienceArXiv
- 2022
This work proposes to consider the task of speech enhancement as a holistic endeavor, and presents a universal speech enhancement system that tackles 55 different distortions at the same time, using a generative model that employs score-based diffusion and a multi-resolution conditioning network that performs enhancement with mixture density networks.
12 References
Adversarial score matching and improved sampling for image generation
- Computer ScienceArXiv
- 2020
This work proposes two improvements to DSM-ALS: 1) Consistent Annealed Sampling as a more stable alternative to Annealed Langevin Sampling, and 2) a hybrid training formulation, composed of both Denoising Score Matching and adversarial objectives.
Improved Techniques for Training Score-Based Generative Models
- Computer ScienceNeurIPS
- 2020
This work provides a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets.
Generative Modeling by Estimating Gradients of the Data Distribution
- Computer ScienceNeurIPS
- 2019
A new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching, which allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons.
How to Train Your Energy-Based Models
- Computer ScienceArXiv
- 2021
This tutorial starts by explaining maximum likelihood training with Markov chain Monte Carlo (MCMC), and proceed to elaborate on MCMC-free approaches, including Score Matching and Noise Constrastive Estimation, to highlight theoretical connections among these three approaches.
Score-Based Generative Modeling through Stochastic Differential Equations
- Computer ScienceICLR
- 2021
This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise.
SESQA: Semi-Supervised Learning for Speech Quality Assessment
- Computer ScienceICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2021
This work tackles automatic speech quality assessment with a semi-supervised learning approach, combining available annotations with programmatically generated data, and using 3 different optimization criteria together with 5 complementary auxiliary tasks.
DiffWave: A Versatile Diffusion Model for Audio Synthesis
- Computer ScienceICLR
- 2021
DiffWave significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.
WaveGrad: Estimating Gradients for Waveform Generation
- Computer ScienceICLR
- 2021
WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.
Denoising Diffusion Probabilistic Models
- Computer ScienceNeurIPS
- 2020
High quality image synthesis results are presented using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics, which naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding.
Estimation of Non-Normalized Statistical Models by Score Matching
- Computer ScienceJ. Mach. Learn. Res.
- 2005
While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, it is proved a surprising result that gives a simple formula that simplifies to a sample average of a sum of some derivatives of the log- density given by the model.