• Corpus ID: 229923541

Likelihood Ratio Exponential Families

@article{Brekelmans2020LikelihoodRE,
  title={Likelihood Ratio Exponential Families},
  author={Rob Brekelmans and Frank Nielsen and Alireza Makhzani and A. G. Galstyan and Greg Ver Steeg},
  journal={ArXiv},
  year={2020},
  volume={abs/2012.15480}
}
The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints [1], while the geometric mixture path is common in MCMC methods such as annealed importance sampling (AIS) [2, 3]. Linking these two ideas, recent work [4] has interpreted the geometric mixture path as an exponential family of distributions to analyse the thermodynamic variational objective (TVO) [5]. In this work, we extend likelihood… 

Figures from this paper

q-Paths: Generalizing the Geometric Annealing Path using Power Means

This work introduces q-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics.

Revisiting Chernoff Information with Likelihood Ratio Exponential Families

This paper revisits the Chernoff information between two densities of a measurable Lebesgue space by considering the exponential families induced by their geometric mixtures: The so-called likelihood ratio exponential families.

Rho-Tau Bregman Information and the Geometry of Annealing Paths

Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing

On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius

We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s

Beyond scalar quasi-arithmetic means: Quasi-arithmetic averages and quasi-arithmetic mixtures in information geometry

It is shown how quasi-arithmetic averages are used to express points on dual geodesics and sided barycenters in the dual affine coordinate systems and describes several parametric and non-parametric statistical models which are closed under the quasi-Arithmetic mixture operation.

References

SHOWING 1-10 OF 38 REFERENCES

All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

An exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods is proposed, which allows the gap in TVO likelihood bounds as a sum of KL divergences and derives a doubly reparameterized gradient estimator which improves model learning and allows the TVo to benefit from more refined bounds.

Graphical Models, Exponential Families, and Variational Inference

The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

Annealed importance sampling

It is shown how one can use the Markov chain transitions for such an annealing sequence to define an importance sampler, which can be seen as a generalization of a recently-proposed variant of sequential importance sampling.

Annealing between distributions by averaging moments

A novel sequence of intermediate distributions for exponential families defined by averaging the moments of the initial and target distributions is presented and an asymptotically optimal piecewise linear schedule is derived.

DEMI: Discriminative Estimator of Mutual Information

It is shown theoretically that the method and other variational approaches are equivalent when they achieve their optimum, while the approach does not optimize a variational bound.

On Variational Bounds of Mutual Information

This work introduces a continuum of lower bounds that encompasses previous bounds and flexibly trades off bias and variance and demonstrates the effectiveness of these new bounds for estimation and representation learning.

Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling

It is shown that the acceptance ratio method and thermodynamic integration are natural generalizations of importance sampling, which is most familiar to statistical audiences.

The Information Bottleneck EM Algorithm

The resulting, Information Bottleneck Expectation Maximization (IB-EM) algorithm, manages to find solutions that are superior to standard EM methods.

Fixing a Broken ELBO

This framework derives variational lower and upper bounds on the mutual information between the input and the latent variable, and uses these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy.

Deterministic annealing for clustering, compression, classification, regression, and related optimization problems

  • K. Rose
  • Computer Science
    Proc. IEEE
  • 1998
The deterministic annealing approach to clustering and its extensions has demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of