• Corpus ID: 5925076

Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications

  title={Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications},
  author={Andreas Buja and Werner Stuetzle and Yi Shen},
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper… 

Diana Grygorian Classifier Evaluation With Proper Scoring Rules

A new cost context for binary classification is presented, where both costs have their own uniform distributions and a corresponding new loss function for this cost context is proposed, named Inverse Score, and is subsequently proven to be a proper scoring rule.

Linear scoring rules for probabilistic binary classification

A class of proper scoring rules are developed called linear scoring rules that are properly adapted to probabilistic binary classification and it is shown that alllinear scoring rules essentially balance the needs of organizers and competitors.

A view of margin losses as regularizers of probability estimates

A novel and unified view of this architecture is proposed, by showing that margin losses act as regularizers of posterior class probabilities, in a way that amplifies classical parameter regularization.

Composite Binary Losses

This work characterises when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduces an intrinsic parametrisation of composite binary losses and gives a complete characterisation of the relationship between proper losses and "classification calibrated" losses.

Boosted Classification Trees and Class Probability/Quantile Estimation

An algorithm is presented that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data "JOUS-Boost", and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries.

Threshold Choice Methods: the Missing Link

The analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, and derives several connections between the aforementioned performance metrics, and highlights the role of calibration in choosing the threshold choice method.

A unified view of performance metrics: translating threshold choice into expected classification loss

This analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation which can be summarised as follows: given a model, apply the threshold choice methods that correspond with the available information about the operating condition, and compare their expected losses.

On Loss Functions and Regret Bounds for Multi-Category Classification

New hinge-like convex losses are derived, which are tighter extensions outside the probability simplex than related hinge- like losses and geometrically simpler with fewer non-differentiable edges, and a classification regret bound is established in general for all losses with the same generalized entropy as the zero-one loss.

Evaluating the discrimination ability of proper multi-variate scoring rules

Proper scoring rules are commonly applied to quantify the accuracy of distribution forecasts. Given an observation they assign a scalar score to each distribution forecast, with the lowest expected

Variable margin losses for classifier design

Experimental results show that the proposed variable margin losses outperform the fixed margin counterparts used by existing algorithms, and it is shown that best performance can be achieved by cross-validating the margin parameter.



Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

It is shown that the statistical consequences of using a convex surrogate of the 0-1 loss function satisfy a pointwise form of Fisher consistency for classification and gives nontrivial bounds under the weakest possible condition on the loss function.

Statistical behavior and consistency of classification methods based on convex risk minimization

This study sheds light on the good performance of some recently proposed linear classification methods including boosting and support vector machines and shows their limitations and suggests possible improvements.

Greedy function approximation: A gradient boosting machine.

A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

Evaluating probabilities: asymmetric scoring rules

Proper scoring rules are over evaluation measures that reward accurate probabilities Specific rules encountered in the literature and used in practice are invariably symmetric in the sense that the

Improved Boosting Algorithms Using Confidence-rated Predictions

We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a

On the boosting ability of top-down decision tree learning algorithms

This work analyzes the performance of top?down algorithms for decision tree learning and proves that some popular and empirically successful heuristics that are base on first principles meet the criteria of an independently motivated theoretical model.

Data mining criteria for tree-based regression and classification

This paper proposes new splitting criteria for growing trees that are more adapted to data mining applications than conventional trees, and adopts a data mining point of view by proposing criteria that search for interesting subsets of the data.

Admissible probability measurement procedures

In this case, it is shown that a certain minor modification of a scoring system with the reproducing property yields the desired admissible probability measurement procedure.

Bias, Variance , And Arcing Classifiers

This work explores two arcing algorithms, compares them to each other and to bagging, and tries to understand how arcing works, which is more sucessful than bagging in variance reduction.

Experiments with a New Boosting Algorithm

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.