# Learning Generative Models with Sinkhorn Divergences

@inproceedings{Genevay2018LearningGM, title={Learning Generative Models with Sinkhorn Divergences}, author={Aude Genevay and Gabriel Peyr{\'e} and Marco Cuturi}, booktitle={AISTATS}, year={2018} }

The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to…

## 414 Citations

### Generative Modeling with Optimal Transport Maps

- Computer ScienceICLR
- 2022

A minmax optimization algorithm is derived to efficiently compute OT maps for the quadratic cost (Wasserstein-2 distance) and this approach is extended to the case when the input and output distributions are located in the spaces of different dimensions and derive error bounds for the computed OT map.

### Sinkhorn Natural Gradient for Generative Models

- Computer ScienceNeurIPS
- 2020

It is shown that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically with respect to the desired accuracy.

### Learning Generative Models using Transformations

- Computer Science
- 2020

This thesis shows an example of incorporating asimple yet fairly representative renderer developed in computer graphics into IGM transformations for generating realistic and highly structured body data, which paves a new path of learning IGMs and proposes a new generic algorithm that can be built on the top of many existing approaches and bring performance improvement over the state-of-the-art.

### Sinkhorn AutoEncoders

- Computer ScienceUAI
- 2019

It is proved that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum, and advertised this framework, which holds for any metric space and prior, as a sweet-spot of current generative autoencoding objectives.

### Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders

- Computer ScienceArXiv
- 2020

This work shows that their method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known “latent variable collapse” issue of VAEs; therefore, providing qualitatively better sample generations as well as more interpretable latent representation than the original Wasserstein Autoencoder with Maximum Mean Discrepancy divergence.

### Distributionally Robust Games: Wasserstein Metric

- Computer Science2018 International Joint Conference on Neural Networks (IJCNN)
- 2018

A game theory framework with Wasserstein metric to train generative models, in which the unknown data distribution is learned by dynamically optimizing the worst-case payoff by measuring the similarity between distributions.

### UvA-DARE (Digital Academic Repository) Sinkhorn AutoEncoders

- Computer Science
- 2019

It is proved that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum, and this framework is advertised as a sweet-spot of current generative autoencoding objectives.

### Sliced Wasserstein Generative Models

- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019

This paper proposes to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion and designs two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders and Generative Adversarial Networks.

### Parametric Adversarial Divergences are Good Losses for Generative Modeling

- Computer Science
- 2017

It is argued that despite being “non-optimal”, parametric divergences have distinct properties from their nonparametric counterparts which can make them more suitable for learning high-dimensional distributions.

### Learning High-Dimensional Distributions with Latent Neural Fokker-Planck Kernels

- Computer ScienceArXiv
- 2021

This paper introduces new techniques to formulate the problem as solving Fokker-Planck equation in a lower-dimensional latent space, aiming to mitigate challenges in high-dimensional data space and proposes a model that can be used to improve arbitrary GANs, and can effectively correct failure cases of the GAN models.

## References

SHOWING 1-10 OF 37 REFERENCES

### Stochastic Optimization for Large-scale Optimal Transport

- Computer ScienceNIPS
- 2016

A new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications, based on entropic regularization of the primal OT problem, which results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence.

### From optimal transport to generative modeling: the VEGAN cookbook

- Computer Science
- 2017

It is shown that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors.

### Inference in generative models using the Wasserstein distance

- Computer Science, Mathematics
- 2017

This work uses Wasserstein distances between empirical distributions of observed data and empirical distribution of synthetic data drawn from such models to estimate their parameters, and proposes an alternative distance using the Hilbert space-filling curve.

### Generative Adversarial Nets

- Computer ScienceNIPS
- 2014

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a…

### The Cramer Distance as a Solution to Biased Wasserstein Gradients

- Computer ScienceArXiv
- 2017

This paper describes three natural properties of probability divergences that it believes reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients and proposes an alternative to the Wasserstein metric, the Cramer distance, which possesses all three desired properties.

### Fast Dictionary Learning with a Smoothed Wasserstein Loss

- Computer ScienceAISTATS
- 2016

This work proposes to use the Wasserstein distance as the fitting error between each original point and its reconstruction, and proposes scalable algorithms to do so, which improves not only speed but also computational stability.

### Generative Moment Matching Networks

- Computer ScienceICML
- 2015

This work forms a method that generates an independent sample via a single feedforward pass through a multilayer perceptron, as in the recently proposed generative adversarial networks, using MMD to learn to generate codes that can then be decoded to produce samples.

### Wasserstein Training of Restricted Boltzmann Machines

- Computer ScienceNIPS
- 2016

This work proposes a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known, and derives a gradient of that distance with respect to the model parameters from the Kullback-Leibler divergence.

### Learning with a Wasserstein Loss

- Computer ScienceNIPS
- 2015

An efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which can encourage smoothness of the predictions with respect to a chosen metric on the output space.

### Training generative neural networks via Maximum Mean Discrepancy optimization

- Computer Science, MathematicsUAI
- 2015

This work considers training a deep neural network to generate samples from an unknown distribution given i.i.d. data to frame learning as an optimization minimizing a two-sample test statistic, and proves bounds on the generalization error incurred by optimizing the empirical MMD.