Corpus ID: 235356249

Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models

  title={Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models},
  author={Daniel D. Johnson and Jacob Austin and Rianne van den Berg and Daniel Tarlow},
Denoising diffusion probabilistic models (DDPMs) have shown impressive results on sequence generation by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader class of corruption processes and denoising models over sequence data that can insert and delete… Expand

Figures and Tables from this paper


Structured Denoising Diffusion Models in Discrete State-Spaces
D3PMs are diffusionlike generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. by going beyond corruption processes with uniform transition probabilities, and it is shown that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. Expand
Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models
This paper introduces two new classes of generative models for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Expand
Optimal Completion Distillation for Sequence Learning
Optimal Completion Distillation is presented, a training procedure for optimizing sequence to sequence models based on edit distance that achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets. Expand
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
This model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average, and is able to reach within about 1 BLEu point of a typical left-to-right transformer model, while decoding significantly faster. Expand
Discrete Object Generation with Reversible Inductive Construction
This work presents a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity and evaluates the proposed approach on two highly structured discrete domains, molecules and Laman graphs, to find that it compares favorably to alternative methods at capturing distributional statistics. Expand
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. Expand
Task Loss Estimation for Sequence Prediction
This work proposes another method for deriving differentiable surrogate losses that provably meet the requirement of consistency with the task loss, and focuses on the broad class of models that define a score for every input-output pair. Expand
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
A way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations is devised, which provides insights on the types of transformations that different approaches can model. Expand
Imputer: Sequence Modelling via Imputation and Dynamic Programming
A tractable dynamic programming training algorithm is presented, which yields a lower bound on the log marginal likelihood of the Imputer, a neural sequence model that generates output sequences iteratively via imputations. Expand
WaveNet: A Generative Model for Raw Audio
WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition. Expand