Corpus ID: 5925076

Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications

@inproceedings{Buja2005LossFF,
  title={Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications},
  author={A. Buja and W. Stuetzle and Yi Shen},
  year={2005}
}
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper… Expand
Diana Grygorian Classifier Evaluation With Proper Scoring Rules
Classification is a fundamental task in machine learning, which involves predicting the class of a data instance based on a set of features. Performance of a classifier can be measured using a lossExpand
Strictly Proper Scoring Rules, Prediction, and Estimation
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is properExpand
On the Universality of the Logistic Loss Function
TLDR
This work shows that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. Expand
Classification using Ensemble Learning under Weighted Misclassification Loss
TLDR
Simulations and examples show that the proposed method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples. Expand
A view of margin losses as regularizers of probability estimates
TLDR
A novel and unified view of this architecture is proposed, by showing that margin losses act as regularizers of posterior class probabilities, in a way that amplifies classical parameter regularization. Expand
On loss functions and regret bounds for multi-category classification
We develop new approaches in multi-class settings for constructing proper scoring rules and hinge-like losses and establishing corresponding regret bounds with respect to the zero-one orExpand
Composite Binary Losses
TLDR
This work characterises when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduces an intrinsic parametrisation of composite binary losses and gives a complete characterisation of the relationship between proper losses and "classification calibrated" losses. Expand
Boosted Classification Trees and Class Probability/Quantile Estimation
TLDR
An algorithm is presented that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data "JOUS-Boost", and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. Expand
Threshold Choice Methods: the Missing Link
TLDR
The analysis provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, and derives several connections between the aforementioned performance metrics, and highlights the role of calibration in choosing the threshold choice method. Expand
Evaluating the Discrimination Ability of Proper Multivariate Scoring Rules
Proper scoring rules are commonly applied to quantify the accuracy of distribution forecasts. Given an observation they assign a scalar score to each distribution forecast, with the the lowestExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
Strictly Proper Scoring Rules, Prediction, and Estimation
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is properExpand
Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
TLDR
It is shown that the statistical consequences of using a convex surrogate of the 0-1 loss function satisfy a pointwise form of Fisher consistency for classification and gives nontrivial bounds under the weakest possible condition on the loss function. Expand
Statistical behavior and consistency of classification methods based on convex risk minimization
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classificationExpand
Greedy function approximation: A gradient boosting machine.
Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansionsExpand
Evaluating probabilities: asymmetric scoring rules
Proper scoring rules are over evaluation measures that reward accurate probabilities Specific rules encountered in the literature and used in practice are invariably symmetric in the sense that theExpand
On the boosting ability of top-down decision tree learning algorithms
TLDR
This work analyzes the performance of top-down algorithms for decision tree learning and proves that some popular and empirically successful heuristics that are based on first principles meet the criteria of an independently motivated theoretical model. Expand
Improved Boosting Algorithms using Confidence-Rated Predictions
We describe several improvements to Freund and Schapire‘s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
Improved Boosting Algorithms Using Confidence-rated Predictions
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
Data mining criteria for tree-based regression and classification
TLDR
This paper proposes new splitting criteria for growing trees that are more adapted to data mining applications than conventional trees, and adopts a data mining point of view by proposing criteria that search for interesting subsets of the data. Expand
Admissible probability measurement procedures
TLDR
In this case, it is shown that a certain minor modification of a scoring system with the reproducing property yields the desired admissible probability measurement procedure. Expand
...
1
2
3
4
5
...