Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization

@article{Nguyen2010EstimatingDF,
title={Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization},
author={X. Nguyen and M. Wainwright and Michael I. Jordan},
journal={IEEE Transactions on Information Theory},
year={2010},
volume={56},
pages={5847-5861}
}
• Published 2010
• Mathematics, Computer Science
• IEEE Transactions on Information Theory
We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these… Expand
481 Citations

Figures and Topics from this paper

Improving convergence of divergence functional ensemble estimators
• Mathematics, Computer Science
• 2016 IEEE International Symposium on Information Theory (ISIT)
• 2016
The theory of optimally weighted ensemble estimation is generalized to derive two estimators that achieve the parametric rate when the densities are sufficiently smooth and an empirical estimator of Rényi-α divergence that outperforms the standard kernel density plug-in estimator, especially in higher dimensions. Expand
Multivariate f-divergence Estimation With Confidence
• Computer Science, Mathematics
• NIPS
• 2014
This work establishes the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples, which has MSE convergence rate of O (1/T), is simple to implement, and performs well in high dimensions. Expand
Nonparametric Estimation of Renyi Divergence and Friends
• Mathematics, Computer Science
• ICML
• 2014
This work shows that nonparametric estimation of L2, Renyi-α and Tsallis-α divergences between continuous distributions achieve the parametric convergence rate of n-1/2 when the densities' smoothness, s, are both at least d/4 where d is the dimension. Expand
Ensemble Estimation of Information Divergence †
• Computer Science, Mathematics
• Entropy
• 2018
An empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions and is shown to be robust to the choice of tuning parameters. Expand
Nonparametric Ensemble Estimation of Distributional Functionals
• Mathematics
• 2016
An empirical estimator of R\'enyi-$\alpha$ divergence is proposed that outperforms the standard kernel density plug-in estimator, especially in high dimension, and is shown to be robust to the choice of tuning parameters. Expand
Non-parametric estimation of integral probability metrics
• Mathematics, Computer Science
• 2010 IEEE International Symposium on Information Theory
• 2010
A nonparametric method for estimating the class of integral probability metrics (IPMs), examples of which include the Wasserstein distance, Dudley metric, and maximum mean discrepancy, is developed and analyzed. Expand
Minimax rate-optimal estimation of KL divergence between discrete distributions
• Computer Science, Mathematics
• 2016 International Symposium on Information Theory and Its Applications (ISITA)
• 2016
A minimax rate-optimal estimator is constructed which is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio, and the effective sample size enlargement phenomenon holds. Expand
Nonparametric divergence estimators for independent subspace analysis
• Computer Science, Mathematics
• 2011 19th European Signal Processing Conference
• 2011
New nonparametric Rényi, Tsallis, and L2 divergence estimators are proposed and their applicability to mutual information estimation and independent subspace analysis is demonstrated. Expand
On Estimating L 22 Divergence
We give a comprehensive theoretical characterization of a nonparametric estimator for the L2 divergence between two continuous distributions. We first bound the rate of convergence of our estimator,Expand
Minimax Estimation of KL Divergence between Discrete Distributions
• Computer Science, Mathematics
• ArXiv
• 2016
The approach refines the approach recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, R\'enyi entropy, mutual information and $\ell_1$ distance in large alphabet settings, and shows that the effective sample size enlargement phenomenon holds significantly more widely than previously established. Expand

References

SHOWING 1-10 OF 75 REFERENCES
Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
• Computer Science, Mathematics
• NIPS
• 2007
An algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions is developed and analyzed, based on a variational characterization of f-divergences, which turns the estimation into a penalized convex risk minimization problem. Expand
Nonparametric estimation of the likelihood ratio and divergence functionals
• Mathematics, Computer Science
• 2007 IEEE International Symposium on Information Theory
• 2007
This work develops and analyzes a nonparametric method for estimating the class of f-divergence functionals, and the density ratio of two probability distributions, and obtains an M-estimator for divergences, based on a convex and differentiable optimization problem that can be solved efficiently. Expand
On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method
Abstract : A class of probability density estimates can be obtained by penalizing the likelihood by a functional which depends on the roughness of the logarithm of the density. The limiting case ofExpand
Parametric estimation and tests through divergences and the duality technique
• Computer Science, Mathematics
• J. Multivar. Anal.
• 2009
A solution to the irregularity problem of the generalized likelihood ratio test pertaining to the number of components in a mixture is given, and a new test based on @g^2-divergence on signed finite measures and the duality technique is proposed. Expand
Divergence estimation of continuous distributions based on data-dependent partitions
• Mathematics, Computer Science
• IEEE Transactions on Information Theory
• 2005
A universal estimator of the divergence D(P/spl par/Q) for two arbitrary continuous distributions P and Q satisfying certain regularity conditions that achieves the best convergence performance in most of the tested cases. Expand
On empirical likelihood for semiparametric two-sample density ratio models
• Mathematics
• 2008
Abstract We consider estimation and test problems for some semiparametric two-sample density ratio models. The profile empirical likelihood (EL) poses an irregularity problem under the nullExpand
Density-free convergence properties of various estimators of entropy
• Mathematics
• 1987
Abstract Let ƒ(x) be a probability density function, x∈Rd. The Shannon (or differential) entropy is defined as H(ƒ)=−∫ƒ(x) log ƒ(x) d x . In this paper we propose, based on a random sample X1,…, XnExpand
Convexity, Classification, and Risk Bounds
• Mathematics
• 2006
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convexExpand
On the estimation of entropy
• Mathematics
• 1993
Motivated by recent work of Joe (1989,Ann. Inst. Statist. Math.,41, 683–697), we introduce estimators of entropy and describe their properties. We study the effects of tail behaviour, distributionExpand
Geometrizing Rates of Convergence, III
• Mathematics
• 1991
Consider estimating a functional T(F) of an unknown distribution F E F from data Xl,. .., Xn i.i.d. F. Let Ct(E) denote the modulus of continuity of the functional T over F, computed with respect toExpand