• Corpus ID: 14229903

Bias Plus Variance Decomposition for Zero-One Loss Functions

@inproceedings{Kohavi1996BiasPV,
  title={Bias Plus Variance Decomposition for Zero-One Loss Functions},
  author={Ron Kohavi and David H. Wolpert},
  booktitle={ICML},
  year={1996}
}
We present a bias variance decomposition of expected misclassi cation rate the most commonly used loss function in supervised classi cation learning The bias variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms yet no decomposition was o ered for the more commonly used zero one misclassi cation loss functions until the recent work of Kong Dietterich and Breiman Their decomposition su ers from some ma jor shortcomings… 
A Unified Bias-Variance Decomposition and its Applications
TLDR
A unified bias-variance decomposition that is applicable to squared loss, zero-one loss, variable misclassification costs, and other loss functions is presented and applied to decision tree learning, instancebased learning and boosting on a large suite of benchmark data sets.
A Unified Bias-Variance Decomposition
TLDR
This article defines bias and variance for an arbitrary loss function, and shows that the resulting decomposition specializes to the standard one for the squared-loss case, and to a close relative of Kong and Dietterich’s (1995)One for the zero-one case.
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss
TLDR
This paper defines bias and variance for an arbitrary loss function, and shows that the resulting decomposition specializes to the standard one for the squared-loss case, and to a close relative of Kong and Dietterich’ s (1995)One for the zero-one case.
A Unifeid Bias-Variance Decomposition and its Applications
TLDR
A unified bias-variance decomposition that is applicable to squared loss, zero-one loss, variable misclassification costs, and other loss functions is presented and applied to decision tree learning, instancebased learning and boosting on a large suite of benchmark data sets.
Bias-Variance trade-off characterization in a classification problem What differences with regression ?
TLDR
Two major interests of this theoretical account on bias-variance decomposition are: first, that the notion of bias needs to be redefined in classification problems and, second, that given appropriate definitions of noise, bias, and variance, it is possible to unify different decompositions in a nice general theoretical framework.
Variance and Bias for General Loss Functions
TLDR
This paper suggests an explicit list of rules that any “reasonable” set of definitions should satisfy and produces bias and variance definitions which generalize to any symmetric loss function.
Practical Bias Variance Decomposition
  • R. Bouckaert
  • Business
    Australasian Conference on Artificial Intelligence
  • 2008
TLDR
This paper examines the various parameters and variants of empirical bias variance decompositions through an extensive simulation study and recommends to use ten fold cross validation as sampling method and take 100 samples within each fold with a test set size of at least 2000.
Bias-Variance Decomposition for Ranking
TLDR
It is shown that ranking disagreements between true orderings and a ranking function can be decomposed into bias and variance components akin to the similar decomposition for the squared loss and other losses that have been previously studied.
On Bias Plus Variance
This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in
Bias-Variance Decomposition for model selection
TLDR
The conclusion is that under specific circumstances model-selection based on the outcome of the bias-variance composition will lead to bad choices, mainly due to instability issues which increase when the overall error is close to zero.
...
...

References

SHOWING 1-10 OF 19 REFERENCES
Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization
TLDR
Experimental results are presented which demonstrate that the ensemble method dramatically improves regression performance on real-world classification tasks.
Bias, Variance , And Arcing Classifiers
TLDR
This work explores two arcing algorithms, compares them to each other and to bagging, and tries to understand how arcing works, which is more sucessful than bagging in variance reduction.
Neural Networks and the Bias/Variance Dilemma
TLDR
It is suggested that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues.
Learning probabilistic relational concept descriptions
TLDR
This dissertation presents methods for increasing the accuracy of probabilistic classification rules learned from noisy, relational data and presents the system HYDRA, which implements the one-per-class approach.
The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework
TLDR
A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem and on the Bayesian " Occam factors " argument for Occam's razor.
Elements of Information Theory
TLDR
The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Heuristics of instability in model
  • 1994
...
...