• Corpus ID: 22775291

Density Estimation in Infinite Dimensional Exponential Families

  title={Density Estimation in Infinite Dimensional Exponential Families},
  author={Bharath K. Sriperumbudur and Kenji Fukumizu and Arthur Gretton and Aapo Hyv{\"a}rinen and Revant Kumar},
  journal={J. Mach. Learn. Res.},
In this paper, we consider an infinite dimensional exponential family, $\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space, $H$ and show it to be quite rich in the sense that a broad class of densities on $\mathbb{R}^d$ can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in $\mathcal{P}$. The main goal of the paper is to estimate an unknown density, $p_0$ through an element in $\mathcal{P}$. Standard… 

Figures from this paper

  • A. Rao
  • Mathematics, Computer Science
  • 2019
This work considers three different estimators of po and shows through numerical simulations that KDE performs better in the univariate case, while the other two methods have superior performance in high dimensional scenarios.
Exponential Series Approaches for Nonparametric Graphical Models
This thesis proposes the method of exponential series, which approximates the log density by a finite- dimensional exponential family with the number of sufficient statistics increasing with the sample size, and proposes a variational approximation to the likelihood based on tree- reweighted, nonparametric message passing.
Learning structured densities via infinite dimensional exponential families
This paper studies the problem of estimating the structure of a probabilistic graphical model without assuming a particular parametric model, and shows how to efficiently minimize the proposed objective using existing group lasso solvers.
Estimating Probability Distributions and their Properties
The derivation of minimax convergence rates is considered, which may help explain why these tools appear to perform well at problems that are intractable from traditionalperspectives of nonparametric statistics.
A Computationally Efficient Method for Learning Exponential Family Distributions
This work proposes a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions and shows that, at the population level, this method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family.
A Statistical Taylor Theorem and Extrapolation of Truncated Densities
This work shows a statistical version of Taylor’s theorem and applies it to non-parametric density estimation from truncated samples, which works under the hard truncation model, where the samples outside some survival set S are never observed, and applies to multiple dimensions.
Efficient and principled score estimation with Nyström kernel exponential families
Compared to an existing score learning approach using a denoising autoencoder, the estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees.
Kernel Exponential Family Estimation via Doubly Dual Embedding
A connection between kernel exponential family estimation and MMD-GANs is established, revealing a new perspective for understanding GANs and it is shown that the proposed estimator empirically outperforms state-of-the-art estimators.
Fisher Efficient Inference of Intractable Models
This paper derives a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation and a Stein operator and proves its consistency and shows that the asymptotic variance of its solution can attain the equality of the efficiency bound under mild regularity conditions.
Efficient and principled score estimation
We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite


Analogous convergence results for the relative entropy are shown to hold in general, for any class of log-density functions and sequence of finite-dimensional linear spaces having L2 and L.
Learning structured densities via infinite dimensional exponential families
This paper studies the problem of estimating the structure of a probabilistic graphical model without assuming a particular parametric model, and shows how to efficiently minimize the proposed objective using existing group lasso solvers.
1 Exponential manifold by reproducing kernel Hilbert spaces 1
  • Mathematics
  • 2008
The purpose of this paper is to propose a method of constructing exponential families of Hilbert manifold, on which estimation theory can be built. Although there have been works on infinite
Mathematical Methods for Supervised Learning
The main focus is to understand what is the rate of approximation, measured either in expectation or probability, that can be obtained under a given prior fρ ∈ Θ, and what are possible algorithms for obtaining optimal or semi-optimal results.
An Infinite-Dimensional Geometric Structure on the Space of all the Probability Measures Equivalent to a Given One
Let M μ be the set of all probability densities equivalent to a given reference probability measure μ. This set is thought of as the maximal regular (i.e., with strictly positive densities)
Mercer’s Theorem on General Domains: On the Interaction between Measures, Kernels, and RKHSs
Given a compact metric space X and a strictly positive Borel measure ν on X, Mercer’s classical theorem states that the spectral decomposition of a positive self-adjoint integral operator
Estimation of Non-Normalized Statistical Models by Score Matching
While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, it is proved a surprising result that gives a simple formula that simplifies to a sample average of a sum of some derivatives of the log- density given by the model.
On regularization algorithms in learning theory
Hilbert Space Embeddings and Metrics on Probability Measures
It is shown that the distance between distributions under γk results from an interplay between the properties of the kernel and the distributions, by demonstrating that distributions are close in the embedding space when their differences occur at higher frequencies.
Optimal Rates for the Regularized Least-Squares Algorithm
A complete minimax analysis of the problem is described, showing that the convergence rates obtained by regularized least-squares estimators are indeed optimal over a suitable class of priors defined by the considered kernel.