• Corpus ID: 4879286

Learning Generative Models with Sinkhorn Divergences

@inproceedings{Genevay2018LearningGM,
  title={Learning Generative Models with Sinkhorn Divergences},
  author={Aude Genevay and Gabriel Peyr{\'e} and Marco Cuturi},
  booktitle={AISTATS},
  year={2018}
}
The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to… 

Figures and Tables from this paper

Generative Modeling with Optimal Transport Maps

TLDR
A minmax optimization algorithm is derived to efficiently compute OT maps for the quadratic cost (Wasserstein-2 distance) and this approach is extended to the case when the input and output distributions are located in the spaces of different dimensions and derive error bounds for the computed OT map.

Sinkhorn Natural Gradient for Generative Models

TLDR
It is shown that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically with respect to the desired accuracy.

Learning Generative Models using Transformations

TLDR
This thesis shows an example of incorporating asimple yet fairly representative renderer developed in computer graphics into IGM transformations for generating realistic and highly structured body data, which paves a new path of learning IGMs and proposes a new generic algorithm that can be built on the top of many existing approaches and bring performance improvement over the state-of-the-art.

Sinkhorn AutoEncoders

TLDR
It is proved that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum, and advertised this framework, which holds for any metric space and prior, as a sweet-spot of current generative autoencoding objectives.

Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders

TLDR
This work shows that their method enables the generative model to fully leverage its deep-latent hierarchy, avoiding the well known “latent variable collapse” issue of VAEs; therefore, providing qualitatively better sample generations as well as more interpretable latent representation than the original Wasserstein Autoencoder with Maximum Mean Discrepancy divergence.

Distributionally Robust Games: Wasserstein Metric

  • Jian GaoH. Tembine
  • Computer Science
    2018 International Joint Conference on Neural Networks (IJCNN)
  • 2018
TLDR
A game theory framework with Wasserstein metric to train generative models, in which the unknown data distribution is learned by dynamically optimizing the worst-case payoff by measuring the similarity between distributions.

UvA-DARE (Digital Academic Repository) Sinkhorn AutoEncoders

TLDR
It is proved that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum, and this framework is advertised as a sweet-spot of current generative autoencoding objectives.

Sliced Wasserstein Generative Models

  • Jiqing WuZhiwu Huang L. Gool
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
This paper proposes to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion and designs two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders and Generative Adversarial Networks.

Parametric Adversarial Divergences are Good Losses for Generative Modeling

TLDR
It is argued that despite being “non-optimal”, parametric divergences have distinct properties from their nonparametric counterparts which can make them more suitable for learning high-dimensional distributions.

Learning High-Dimensional Distributions with Latent Neural Fokker-Planck Kernels

TLDR
This paper introduces new techniques to formulate the problem as solving Fokker-Planck equation in a lower-dimensional latent space, aiming to mitigate challenges in high-dimensional data space and proposes a model that can be used to improve arbitrary GANs, and can effectively correct failure cases of the GAN models.
...

References

SHOWING 1-10 OF 37 REFERENCES

Stochastic Optimization for Large-scale Optimal Transport

TLDR
A new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications, based on entropic regularization of the primal OT problem, which results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence.

From optimal transport to generative modeling: the VEGAN cookbook

TLDR
It is shown that POT for the 2-Wasserstein distance coincides with the objective heuristically employed in adversarial auto-encoders (AAE) (Makhzani et al., 2016), which provides the first theoretical justification for AAEs known to the authors.

Inference in generative models using the Wasserstein distance

TLDR
This work uses Wasserstein distances between empirical distributions of observed data and empirical distribution of synthetic data drawn from such models to estimate their parameters, and proposes an alternative distance using the Hilbert space-filling curve.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

The Cramer Distance as a Solution to Biased Wasserstein Gradients

TLDR
This paper describes three natural properties of probability divergences that it believes reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients and proposes an alternative to the Wasserstein metric, the Cramer distance, which possesses all three desired properties.

Fast Dictionary Learning with a Smoothed Wasserstein Loss

TLDR
This work proposes to use the Wasserstein distance as the fitting error between each original point and its reconstruction, and proposes scalable algorithms to do so, which improves not only speed but also computational stability.

Generative Moment Matching Networks

TLDR
This work forms a method that generates an independent sample via a single feedforward pass through a multilayer perceptron, as in the recently proposed generative adversarial networks, using MMD to learn to generate codes that can then be decoded to produce samples.

Wasserstein Training of Restricted Boltzmann Machines

TLDR
This work proposes a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known, and derives a gradient of that distance with respect to the model parameters from the Kullback-Leibler divergence.

Learning with a Wasserstein Loss

TLDR
An efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which can encourage smoothness of the predictions with respect to a chosen metric on the output space.

Training generative neural networks via Maximum Mean Discrepancy optimization

TLDR
This work considers training a deep neural network to generate samples from an unknown distribution given i.i.d. data to frame learning as an optimization minimizing a two-sample test statistic, and proves bounds on the generalization error incurred by optimizing the empirical MMD.