Infinite-dimensional gradient-based descent for alpha-divergence minimisation

  title={Infinite-dimensional gradient-based descent for alpha-divergence minimisation},
  author={Kam'elia Daudel and Randal Douc and Franccois Portier},
  journal={The Annals of Statistics},
This paper introduces the $(\alpha, \Gamma)$-descent, an iterative algorithm which operates on measures and performs $\alpha$-divergence minimisation in a Bayesian framework. This gradient-based procedure extends the commonly-used variational approximation by adding a prior on the variational parameters in the form of a measure. We prove that for a rich family of functions $\Gamma$, this algorithm leads at each step to a systematic decrease in the $\alpha$-divergence. Our framework recovers the… 

Figures and Tables from this paper

Mixture weights optimisation for Alpha-Divergence Variational Inference

The link between Power Descent and Entropic Mirror Descent is investigated and first-order approximations allow us to introduce the Rényi Descent, a novel algorithm for which the authors prove an O (1 /N ) convergence rate.

Monotonic Alpha-divergence Minimisation

This paper introduces a novel iterative algorithm which carries out α-divergence minimisation by ensuring a systematic decrease in the α-Divergence at each step, and sheds a new light on an integrated Expectation Maximization algorithm.

Monotonic Alpha-divergence Minimisation for Variational Inference

A novel family of iterative algorithms which carry out α-divergence minimisation in a Variational Inference context by ensuring a systematic decrease at each step in the α- divergence between the variational and the posterior distributions are introduced.

Adaptive Importance Sampling meets Mirror Descent : a Bias-variance Tradeoff

Adaptive importance sampling is a widely spread Monte Carlo technique that uses a re-weighting strategy to iteratively estimate the so-called target distribution. A major draw-back of adaptive

Variational inference via Wasserstein gradient flows

This work proposes principled methods for VI, in which π̂ is taken to be a Gaussian or a mixture of Gaussians, which rest upon the theory of gradient flows on the Bures–Wasserstein space of Gaussian measures.

A Novel Estimator of Mutual Information for Learning to Disentangle Textual Representations

A novel variational upper bound to the mutual information between an attribute and the latent code of an encoder is introduced, leading to both better disentangled representations and in particular, a precise control of the desirable degree of disentanglement than state-of-the-art methods proposed for textual data.



Safe adaptive importance sampling: A mixture approach

This paper investigates adaptive importance sampling algorithms for which the policy, the sequence of distributions used to generate the particles, is a mixture distribution between a flexible kernel

The $f$-Divergence Expectation Iteration Scheme.

Empirical results support the claim that the novel iterative algorithm which operates on measures and performs $f-divergence minimisation minimisation in a Bayesian framework serves as a powerful tool to assist Variational methods.

Efficiency versus robustness : the case for minimum Hellinger distance and related methods

It is shown how and why the influence curve poorly measures the robustness properties of minimum Hellinger distance estimation. Rather, for this and related forms of estimation, there is another

Markov Processes and the H -Theorem

The H -theorem is investigated in view of Markov processes. The proof is valid even in the fields other than physics, since none of physical relations, such as the principle of microscopic

Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen.

Die Theorie der Integralgleichungen und die anschließenden neueren Untersuchungen sind von vornherein von dem Bestreben getragen worden, die Sätze der Algebra über lineare Gleichungssysteme und

Adaptive importance sampling in monte carlo integration

An Adaptive Importance Sampling (AIS) scheme is introduced to compute integrals of the form as a mechanical, yet flexible, way of dealing with the selection of parameters of the importance function.

Information geometric measurements of generalisation

The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd, which offers conceptual simplification to information geometry.

A Generalization Bound for Online Variational Inference

It is shown that this is indeed the case for some variational inference (VI) algorithms, and theoretical justifications in favor of online algorithms relying on approximate Bayesian methods are presented.

Safe and adaptive importance sampling: a mixture approach

This paper investigates adaptive importance sampling algorithms for which the policy , the sequence of distributions used to generate the particles, is a mixture distribution between a flexible kernel

Bayesian estimates of equation system parameters, An application of integration by Monte Carlo

textabstractMonte Carlo (MC) is used to draw parameter values from a distribution defined on the structural parameter space of an equation system. Making use of the prior density, the likelihood, and