Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization

@article{Nguyen2010EstimatingDF,
  title={Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization},
  author={X. Nguyen and M. Wainwright and Michael I. Jordan},
  journal={IEEE Transactions on Information Theory},
  year={2010},
  volume={56},
  pages={5847-5861}
}
We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these… Expand
Improving convergence of divergence functional ensemble estimators
TLDR
The theory of optimally weighted ensemble estimation is generalized to derive two estimators that achieve the parametric rate when the densities are sufficiently smooth and an empirical estimator of Rényi-α divergence that outperforms the standard kernel density plug-in estimator, especially in higher dimensions. Expand
Multivariate f-divergence Estimation With Confidence
TLDR
This work establishes the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples, which has MSE convergence rate of O (1/T), is simple to implement, and performs well in high dimensions. Expand
Nonparametric Estimation of Renyi Divergence and Friends
TLDR
This work shows that nonparametric estimation of L2, Renyi-α and Tsallis-α divergences between continuous distributions achieve the parametric convergence rate of n-1/2 when the densities' smoothness, s, are both at least d/4 where d is the dimension. Expand
Ensemble Estimation of Information Divergence †
TLDR
An empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions and is shown to be robust to the choice of tuning parameters. Expand
Nonparametric Ensemble Estimation of Distributional Functionals
TLDR
An empirical estimator of R\'enyi-$\alpha$ divergence is proposed that outperforms the standard kernel density plug-in estimator, especially in high dimension, and is shown to be robust to the choice of tuning parameters. Expand
Non-parametric estimation of integral probability metrics
TLDR
A nonparametric method for estimating the class of integral probability metrics (IPMs), examples of which include the Wasserstein distance, Dudley metric, and maximum mean discrepancy, is developed and analyzed. Expand
Minimax rate-optimal estimation of KL divergence between discrete distributions
TLDR
A minimax rate-optimal estimator is constructed which is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio, and the effective sample size enlargement phenomenon holds. Expand
Nonparametric divergence estimators for independent subspace analysis
TLDR
New nonparametric Rényi, Tsallis, and L2 divergence estimators are proposed and their applicability to mutual information estimation and independent subspace analysis is demonstrated. Expand
On Estimating L 22 Divergence
We give a comprehensive theoretical characterization of a nonparametric estimator for the L2 divergence between two continuous distributions. We first bound the rate of convergence of our estimator,Expand
Minimax Estimation of KL Divergence between Discrete Distributions
TLDR
The approach refines the approach recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, R\'enyi entropy, mutual information and $\ell_1$ distance in large alphabet settings, and shows that the effective sample size enlargement phenomenon holds significantly more widely than previously established. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 75 REFERENCES
Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
TLDR
An algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions is developed and analyzed, based on a variational characterization of f-divergences, which turns the estimation into a penalized convex risk minimization problem. Expand
Nonparametric estimation of the likelihood ratio and divergence functionals
TLDR
This work develops and analyzes a nonparametric method for estimating the class of f-divergence functionals, and the density ratio of two probability distributions, and obtains an M-estimator for divergences, based on a convex and differentiable optimization problem that can be solved efficiently. Expand
On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method
Abstract : A class of probability density estimates can be obtained by penalizing the likelihood by a functional which depends on the roughness of the logarithm of the density. The limiting case ofExpand
Parametric estimation and tests through divergences and the duality technique
TLDR
A solution to the irregularity problem of the generalized likelihood ratio test pertaining to the number of components in a mixture is given, and a new test based on @g^2-divergence on signed finite measures and the duality technique is proposed. Expand
Divergence estimation of continuous distributions based on data-dependent partitions
TLDR
A universal estimator of the divergence D(P/spl par/Q) for two arbitrary continuous distributions P and Q satisfying certain regularity conditions that achieves the best convergence performance in most of the tested cases. Expand
On empirical likelihood for semiparametric two-sample density ratio models
Abstract We consider estimation and test problems for some semiparametric two-sample density ratio models. The profile empirical likelihood (EL) poses an irregularity problem under the nullExpand
Density-free convergence properties of various estimators of entropy
Abstract Let ƒ(x) be a probability density function, x∈Rd. The Shannon (or differential) entropy is defined as H(ƒ)=−∫ƒ(x) log ƒ(x) d x . In this paper we propose, based on a random sample X1,…, XnExpand
Convexity, Classification, and Risk Bounds
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convexExpand
On the estimation of entropy
Motivated by recent work of Joe (1989,Ann. Inst. Statist. Math.,41, 683–697), we introduce estimators of entropy and describe their properties. We study the effects of tail behaviour, distributionExpand
Geometrizing Rates of Convergence, III
Consider estimating a functional T(F) of an unknown distribution F E F from data Xl,. .., Xn i.i.d. F. Let Ct(E) denote the modulus of continuity of the functional T over F, computed with respect toExpand
...
1
2
3
4
5
...