# Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions

@inproceedings{Zheng2021ExploitingCR, title={Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions}, author={Huangjie Zheng and Mingyuan Zhou}, booktitle={NeurIPS}, year={2021} }

To measure the difference between two probability distributions, referred to as the source and target, respectively, we exploit both the chain rule and Bayes’ theorem to construct conditional transport (CT), which is constituted by both a forward component and a backward one. The forward CT is the expected cost of moving a source data point to a target one, with their joint distribution defined by the product of the source probability density function (PDF) and a source-dependent conditional…

## 3 Citations

Truncated Diffusion Probabilistic Models

- Mathematics, Computer ScienceArXiv
- 2022

Experimental results show the truncated diﬀusion probabilistic models provide consistent improvements over the non-truncated ones in terms of the generation performance and the number of required inverse di-usion steps.

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning

- Computer ScienceICML
- 2022

This work regularizes the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process, reducing the error induced by distribution mismatch.

Diffusion-GAN: Training GANs with Diffusion

- Computer ScienceArXiv
- 2022

A rich set of experiments on diverse datasets show that Diﬀusion-GAN can provide stable and data-eﬃcient GAN training, bringing consistent performance improvement over strong GAN baselines for synthesizing photorealistic images.

## References

SHOWING 1-10 OF 79 REFERENCES

Variational Inference: A Review for Statisticians

- Computer ScienceArXiv
- 2016

Variational inference (VI), a method from machine learning that approximates probability densities through optimization, is reviewed and a variant that uses stochastic optimization to scale up to massive data is derived.

On parameter estimation with the Wasserstein distance

- Mathematics, Computer ScienceInformation and Inference: A Journal of the IMA
- 2019

These results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model, and some difficulties arising in the numerical approximation of these estimators are discussed.

Score-Based Generative Modeling through Stochastic Differential Equations

- Computer ScienceICLR
- 2021

This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise.

Learning Generative Models with Sinkhorn Divergences

- Computer ScienceAISTATS
- 2018

This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles three issues by relying on two key ideas: entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; and algorithmic (automatic) differentiation of these iterations.

The Cramer Distance as a Solution to Biased Wasserstein Gradients

- Computer ScienceArXiv
- 2017

This paper describes three natural properties of probability divergences that it believes reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients and proposes an alternative to the Wasserstein metric, the Cramer distance, which possesses all three desired properties.

Graphical Models, Exponential Families, and Variational Inference

- Computer ScienceFound. Trends Mach. Learn.
- 2008

The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

Machine learning - a probabilistic perspective

- Computer ScienceAdaptive computation and machine learning series
- 2012

This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

Auto-Encoding Variational Bayes

- Computer ScienceICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

Hierarchical Implicit Models and Likelihood-Free Variational Inference

- Computer ScienceNIPS
- 2017

HIMs are introduced, which combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure and likelihood-free variational inference (LFVI), a scalable Variational inference algorithm for HIMs.

How Well Do WGANs Estimate the Wasserstein Metric?

- Computer ScienceArXiv
- 2019

This work studies how well the methods, that are used in generative adversarial networks to approximate the Wasserstein metric, perform, and considers, in particular, the $c-transform formulation, which eliminates the need to enforce the constraints explicitly.