# ON surrogate loss functions and f-divergences

@article{Nguyen2005ONSL,
title={ON surrogate loss functions and f-divergences},
author={X. Nguyen and M. Wainwright and Michael I. Jordan},
journal={Annals of Statistics},
year={2005},
volume={37},
pages={876-904}
}
• Published 2005
• Mathematics, Computer Science
• Annals of Statistics
The goal of binary classification is to estimate a discriminant function y from observations of covariate vectors and corresponding binary labels. We consider an elaboration of this problem in which the covariates are not available directly but are transformed by a dimensionality-reducing quantizer Q. We present conditions on loss functions such that empirical risk minimization yields Bayes consistency when both the discriminant function and the quantizer are estimated. These conditions are… Expand
114 Citations

#### Figures and Topics from this paper

Submitted to the Annals of Statistics MULTICLASS CLASSIFICATION , INFORMATION , DIVERGENCE , AND SURROGATE RISK By
We provide a unifying view of statistical information measures, multi-way Bayesian hypothesis testing, loss functions for multi-class classification problems, and multi-distribution f -divergences,Expand
Convexity , Detection , and Generalized f-divergences
The goal of classification problem is to learn a discriminant function for classification of forthcoming data. Given data coming in i.i.d. pairs (Xi, Yi), for 1 ≤ i ≤ n, from some underlyingExpand
Information Measures, Experiments, Multi-category Hypothesis Tests, and Surrogate Losses
• Mathematics, Computer Science
• ArXiv
• 2016
A main consequence of the results is to describe those convex loss functions that are Fisher consistent for jointly choosing a data representation and minimizing the (weighted) probability of error in multi-category classification and hypothesis testing problems. Expand
A Class of Parameterized Loss Functions for Classification: Optimization Tradeoffs and Robustness Characteristics.
• Computer Science
• 2019
It is proved that smaller $\alpha$ values are more conducive to faster optimization and suggested that larger $\ alpha$ values lead to better generalization performance, and provided strong evidence supporting this assertion with several experiments on benchmark datasets. Expand
Convex Surrogate Minimization in Classification
• Mathematics
• 2018
Convex optimization has become an increasingly important theme in applications. We consider the construction of a binary classification rule by minimizing the risk based on a convex loss as aExpand
On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL
• Mathematics, Computer Science
• ArXiv
• 2017
This chapter argues that if such classifiers, in their respective training phases, optimize a so-called surrogate loss that it may also be valuable to compare the behavior of this loss on the test set, next to the regular classification error rate. Expand
A Tunable Loss Function for Binary Classification
• Computer Science, Mathematics
• 2019 IEEE International Symposium on Information Theory (ISIT)
• 2019
It is proved that α-loss has an equivalent margin-based form and is classification-calibrated, two desirable properties for a good surrogate loss function for the ideal yet intractable 0-1 loss. Expand
A Tunable Loss Function for Classification
• Computer Science, Mathematics
• ArXiv
• 2019
It is proved that smaller $\alpha$ values are more conducive to faster optimization and suggested that larger $\ alpha$ values lead to better generalization performance, and provided strong evidence supporting this assertion with several experiments on benchmark datasets. Expand
Information, Divergence and Risk for Binary Experiments
• Computer Science, Mathematics
• J. Mach. Learn. Res.
• 2011
The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates maximum mean discrepancy to Fisher linear discriminants. Expand
New aspects of Bregman divergence in regression and classification with parametric and nonparametric estimation
• Mathematics
• 2009
In statistical learning, regression and classification concern different types of the output variables, and the predictive accuracy is quantified by different loss functions. This article exploresExpand

#### References

SHOWING 1-10 OF 56 REFERENCES
Convexity, Classification, and Risk Bounds
• Mathematics
• 2006
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convexExpand
Statistical behavior and consistency of classification methods based on convex risk minimization
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classificationExpand
Special Invited Paper-Additive logistic regression: A statistical view of boosting
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training dataExpand
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
• Computer Science, Mathematics
• COLT 1997
• 1997
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
Consistency of support vector machines and other regularized kernel classifiers
• Ingo Steinwart
• Mathematics, Computer Science
• IEEE Transactions on Information Theory
• 2005
It is shown that various classifiers that are based on minimization of a regularized risk are universally consistent, i.e., they can asymptotically learn in every classification task. The role of theExpand
Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity
• Mathematics, Computer Science
• J. Mach. Learn. Res.
• 2003
Focusing on specific classes of problems, this work provides conditions under which their greedy procedure achieves the (nearly) minimax rate of convergence, implying that the procedure cannot be improved in a worst case setting. Expand
On the Bayes-risk consistency of regularized boosting methods
• Mathematics
• 2003
The probability of error of classification methods based on convex combinations of simple base classifiers by boosting algorithms is investigated. The main result of the paper is that certainExpand
Extremal properties of likelihood-ratio quantizers
• J. Tsitsiklis
• Mathematics, Computer Science
• IEEE Trans. Commun.
• 1993
Optimality properties of likelihood-ratio quantizers are established for a very broad class of quantization problems, including problems involving the maximization of an Ali-Silvey (1966) distance measure and the Neyman-Pearson variant of the decentralized detection problem. Expand
Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems
• Mathematics, Computer Science
• IEEE Trans. Commun.
• 1977
The Ali-Silvey class of distance measures is applied to the problem of designing quantizers for use in binary detection systems, and necessary conditions are established for the selection of quantizer parameters in this context. Expand
The Divergence and Bhattacharyya Distance Measures in Signal Selection
Minimization of the error probability to determine optimum signals is often difficult to carry out. Consequently, several suboptimum performance measures that are easier than the error probability toExpand