# Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications

@inproceedings{Buja2005LossFF, title={Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications}, author={Andreas Buja and Werner Stuetzle and Yi Shen}, year={2005} }

What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper…

## 267 Citations

### Diana Grygorian Classifier Evaluation With Proper Scoring Rules

- Computer Science
- 2019

A new cost context for binary classification is presented, where both costs have their own uniform distributions and a corresponding new loss function for this cost context is proposed, named Inverse Score, and is subsequently proven to be a proper scoring rule.

### Strictly Proper Scoring Rules, Prediction, and Estimation

- Computer Science
- 2007

The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.

### Classification using ensemble learning under weighted misclassification loss

- Computer ScienceStatistics in medicine
- 2019

Simulations and examples show that the proposed method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.

### Linear scoring rules for probabilistic binary classification

- Computer Science
- 2016

A class of proper scoring rules are developed called linear scoring rules that are properly adapted to probabilistic binary classiﬁcation and it is shown that alllinear scoring rules essentially balance the needs of organizers and competitors.

### On the Universality of the Logistic Loss Function

- Computer Science2018 IEEE International Symposium on Information Theory (ISIT)
- 2018

This work shows that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant.

### A view of margin losses as regularizers of probability estimates

- Computer ScienceJ. Mach. Learn. Res.
- 2015

A novel and unified view of this architecture is proposed, by showing that margin losses act as regularizers of posterior class probabilities, in a way that amplifies classical parameter regularization.

### Composite Binary Losses

- Computer ScienceJ. Mach. Learn. Res.
- 2010

This work characterises when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduces an intrinsic parametrisation of composite binary losses and gives a complete characterisation of the relationship between proper losses and "classification calibrated" losses.

### Boosted Classification Trees and Class Probability/Quantile Estimation

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2007

An algorithm is presented that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data "JOUS-Boost", and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries.

### Threshold Choice Methods: the Missing Link

- Computer ScienceArXiv
- 2011

The analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, and derives several connections between the aforementioned performance metrics, and highlights the role of calibration in choosing the threshold choice method.

### A unified view of performance metrics: translating threshold choice into expected classification loss

- Computer ScienceJ. Mach. Learn. Res.
- 2012

This analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses.

## References

SHOWING 1-10 OF 39 REFERENCES

### Strictly Proper Scoring Rules, Prediction, and Estimation

- Computer Science
- 2007

The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.

### Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

- Computer ScienceNIPS
- 2003

It is shown that the statistical consequences of using a convex surrogate of the 0-1 loss function satisfy a pointwise form of Fisher consistency for classification and gives nontrivial bounds under the weakest possible condition on the loss function.

### Statistical behavior and consistency of classification methods based on convex risk minimization

- Computer Science
- 2003

This study sheds light on the good performance of some recently proposed linear classification methods including boosting and support vector machines and shows their limitations and suggests possible improvements.

### Greedy function approximation: A gradient boosting machine.

- Computer Science
- 2001

A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

### Evaluating probabilities: asymmetric scoring rules

- Environmental Science
- 1994

Proper scoring rules are over evaluation measures that reward accurate probabilities Specific rules encountered in the literature and used in practice are invariably symmetric in the sense that the…

### Improved Boosting Algorithms Using Confidence-rated Predictions

- Computer ScienceCOLT' 98
- 1998

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a…

### On the boosting ability of top-down decision tree learning algorithms

- Computer ScienceSTOC '96
- 1996

This work analyzes the performance of top?down algorithms for decision tree learning and proves that some popular and empirically successful heuristics that are base on first principles meet the criteria of an independently motivated theoretical model.

### Data mining criteria for tree-based regression and classification

- Computer ScienceKDD '01
- 2001

This paper proposes new splitting criteria for growing trees that are more adapted to data mining applications than conventional trees, and adopts a data mining point of view by proposing criteria that search for interesting subsets of the data.

### Admissible probability measurement procedures

- MathematicsPsychometrika
- 1966

In this case, it is shown that a certain minor modification of a scoring system with the reproducing property yields the desired admissible probability measurement procedure.

### Experiments with a New Boosting Algorithm

- Computer ScienceICML
- 1996

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.