• Corpus ID: 235670276

Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions

  title={Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions},
  author={Huangjie Zheng and Mingyuan Zhou},
To measure the difference between two probability distributions, referred to as the source and target, respectively, we exploit both the chain rule and Bayes’ theorem to construct conditional transport (CT), which is constituted by both a forward component and a backward one. The forward CT is the expected cost of moving a source data point to a target one, with their joint distribution defined by the product of the source probability density function (PDF) and a source-dependent conditional… 

Figures and Tables from this paper

Truncated Diffusion Probabilistic Models
Experimental results show the truncated diffusion probabilistic models provide consistent improvements over the non-truncated ones in terms of the generation performance and the number of required inverse di-usion steps.
Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning
This work regularizes the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process, reducing the error induced by distribution mismatch.
Diffusion-GAN: Training GANs with Diffusion
A rich set of experiments on diverse datasets show that Diffusion-GAN can provide stable and data-efficient GAN training, bringing consistent performance improvement over strong GAN baselines for synthesizing photorealistic images.


Variational Inference: A Review for Statisticians
Variational inference (VI), a method from machine learning that approximates probability densities through optimization, is reviewed and a variant that uses stochastic optimization to scale up to massive data is derived.
On parameter estimation with the Wasserstein distance
These results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model, and some difficulties arising in the numerical approximation of these estimators are discussed.
Score-Based Generative Modeling through Stochastic Differential Equations
This work presents a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by Slowly removing the noise.
Learning Generative Models with Sinkhorn Divergences
This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles three issues by relying on two key ideas: entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; and algorithmic (automatic) differentiation of these iterations.
The Cramer Distance as a Solution to Biased Wasserstein Gradients
This paper describes three natural properties of probability divergences that it believes reflect requirements from machine learning: sum invariance, scale sensitivity, and unbiased sample gradients and proposes an alternative to the Wasserstein metric, the Cramer distance, which possesses all three desired properties.
Graphical Models, Exponential Families, and Variational Inference
The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.
Machine learning - a probabilistic perspective
  • K. Murphy
  • Computer Science
    Adaptive computation and machine learning series
  • 2012
This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Hierarchical Implicit Models and Likelihood-Free Variational Inference
HIMs are introduced, which combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure and likelihood-free variational inference (LFVI), a scalable Variational inference algorithm for HIMs.
How Well Do WGANs Estimate the Wasserstein Metric?
This work studies how well the methods, that are used in generative adversarial networks to approximate the Wasserstein metric, perform, and considers, in particular, the $c-transform formulation, which eliminates the need to enforce the constraints explicitly.