# Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization

@article{Nguyen2010EstimatingDF, title={Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization}, author={X. Nguyen and M. Wainwright and Michael I. Jordan}, journal={IEEE Transactions on Information Theory}, year={2010}, volume={56}, pages={5847-5861} }

We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these… Expand

#### 481 Citations

Improving convergence of divergence functional ensemble estimators

- Mathematics, Computer Science
- 2016 IEEE International Symposium on Information Theory (ISIT)
- 2016

The theory of optimally weighted ensemble estimation is generalized to derive two estimators that achieve the parametric rate when the densities are sufficiently smooth and an empirical estimator of Rényi-α divergence that outperforms the standard kernel density plug-in estimator, especially in higher dimensions. Expand

Multivariate f-divergence Estimation With Confidence

- Computer Science, Mathematics
- NIPS
- 2014

This work establishes the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples, which has MSE convergence rate of O (1/T), is simple to implement, and performs well in high dimensions. Expand

Nonparametric Estimation of Renyi Divergence and Friends

- Mathematics, Computer Science
- ICML
- 2014

This work shows that nonparametric estimation of L2, Renyi-α and Tsallis-α divergences between continuous distributions achieve the parametric convergence rate of n-1/2 when the densities' smoothness, s, are both at least d/4 where d is the dimension. Expand

Ensemble Estimation of Information Divergence †

- Computer Science, Mathematics
- Entropy
- 2018

An empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions and is shown to be robust to the choice of tuning parameters. Expand

Nonparametric Ensemble Estimation of Distributional Functionals

- Mathematics
- 2016

An empirical estimator of R\'enyi-$\alpha$ divergence is proposed that outperforms the standard kernel density plug-in estimator, especially in high dimension, and is shown to be robust to the choice of tuning parameters. Expand

Non-parametric estimation of integral probability metrics

- Mathematics, Computer Science
- 2010 IEEE International Symposium on Information Theory
- 2010

A nonparametric method for estimating the class of integral probability metrics (IPMs), examples of which include the Wasserstein distance, Dudley metric, and maximum mean discrepancy, is developed and analyzed. Expand

Minimax rate-optimal estimation of KL divergence between discrete distributions

- Computer Science, Mathematics
- 2016 International Symposium on Information Theory and Its Applications (ISITA)
- 2016

A minimax rate-optimal estimator is constructed which is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio, and the effective sample size enlargement phenomenon holds. Expand

Nonparametric divergence estimators for independent subspace analysis

- Computer Science, Mathematics
- 2011 19th European Signal Processing Conference
- 2011

New nonparametric Rényi, Tsallis, and L2 divergence estimators are proposed and their applicability to mutual information estimation and independent subspace analysis is demonstrated. Expand

On Estimating L 22 Divergence

- 2015

We give a comprehensive theoretical characterization of a nonparametric estimator for the L2 divergence between two continuous distributions. We first bound the rate of convergence of our estimator,… Expand

Minimax Estimation of KL Divergence between Discrete Distributions

- Computer Science, Mathematics
- ArXiv
- 2016

The approach refines the approach recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, R\'enyi entropy, mutual information and $\ell_1$ distance in large alphabet settings, and shows that the effective sample size enlargement phenomenon holds significantly more widely than previously established. Expand

#### References

SHOWING 1-10 OF 75 REFERENCES

Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

- Computer Science, Mathematics
- NIPS
- 2007

An algorithm for nonparametric estimation of divergence functionals and the density ratio of two probability distributions is developed and analyzed, based on a variational characterization of f-divergences, which turns the estimation into a penalized convex risk minimization problem. Expand

Nonparametric estimation of the likelihood ratio and divergence functionals

- Mathematics, Computer Science
- 2007 IEEE International Symposium on Information Theory
- 2007

This work develops and analyzes a nonparametric method for estimating the class of f-divergence functionals, and the density ratio of two probability distributions, and obtains an M-estimator for divergences, based on a convex and differentiable optimization problem that can be solved efficiently. Expand

On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method

- Mathematics
- 1982

Abstract : A class of probability density estimates can be obtained by penalizing the likelihood by a functional which depends on the roughness of the logarithm of the density. The limiting case of… Expand

Parametric estimation and tests through divergences and the duality technique

- Computer Science, Mathematics
- J. Multivar. Anal.
- 2009

A solution to the irregularity problem of the generalized likelihood ratio test pertaining to the number of components in a mixture is given, and a new test based on @g^2-divergence on signed finite measures and the duality technique is proposed. Expand

Divergence estimation of continuous distributions based on data-dependent partitions

- Mathematics, Computer Science
- IEEE Transactions on Information Theory
- 2005

A universal estimator of the divergence D(P/spl par/Q) for two arbitrary continuous distributions P and Q satisfying certain regularity conditions that achieves the best convergence performance in most of the tested cases. Expand

On empirical likelihood for semiparametric two-sample density ratio models

- Mathematics
- 2008

Abstract We consider estimation and test problems for some semiparametric two-sample density ratio models. The profile empirical likelihood (EL) poses an irregularity problem under the null… Expand

Density-free convergence properties of various estimators of entropy

- Mathematics
- 1987

Abstract Let ƒ(x) be a probability density function, x∈Rd. The Shannon (or differential) entropy is defined as H(ƒ)=−∫ƒ(x) log ƒ(x) d x . In this paper we propose, based on a random sample X1,…, Xn… Expand

Convexity, Classification, and Risk Bounds

- Mathematics
- 2006

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex… Expand

On the estimation of entropy

- Mathematics
- 1993

Motivated by recent work of Joe (1989,Ann. Inst. Statist. Math.,41, 683–697), we introduce estimators of entropy and describe their properties. We study the effects of tail behaviour, distribution… Expand

Geometrizing Rates of Convergence, III

- Mathematics
- 1991

Consider estimating a functional T(F) of an unknown distribution F E F from data Xl,. .., Xn i.i.d. F. Let Ct(E) denote the modulus of continuity of the functional T over F, computed with respect to… Expand