An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

  title={An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants},
  author={Eric Bauer and Ron Kohavi},
  journal={Machine Learning},
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation… 

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting, and sometimes better than randomization.

adabag: An R Package for Classification with Boosting and Bagging

In this paper, the adabag R package is introduced and AdaBoost.M1, SAMME and bagging algorithms with classification trees as base classifiers are implemented.

An experimental study on diversity for bagging and boosting with linear classifiers

Randomized ensemble methods for classification trees

Two methods of constructing ensembles of classifiers are proposed, one of which directly injects randomness into classification tree algorithms by choosing a split randomly at each node with probabilities proportional to the measure of goodness for a split and the other perturbs the output and constructs a classifier using the perturbed data.

A Comparison of Decision Tree Ensemble Creation Techniques

An algorithm is introduced that decides when a sufficient number of classifiers has been created for an ensemble, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble.

An empirical comparison of ensemble methods based on classification trees

An empirical comparison of the classification error of several ensemble methods based on classification trees is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih in 2000.

Bagging and Boosting

Bagging and boosting are examples of ensemble learning methods from data mining that combine the predictions of many different models into an ensemble prediction, but the resulting model is not as interpretable as the constituent models because they average over a large collection of predictions.

Enhanced Bagging (eBagging): A Novel Approach for Ensemble Learning

A novel modified version of bagging, named enhanced Bagging (eBagging), which uses a new mechanism (error-based bootstrapping) when constructing training sets in order to cope with this problem of random selection in ensemble learning.

Predictive Ensemble Modelling: Experimental Comparison of Boosting Implementation Methods

This paper presents the empirical comparison of boosting implementation by reweighting and resampling methods and found that the complexity of the chosen ensemble technique and boosting method does not necessarily lead to better performance.



Experiments with a New Boosting Algorithm

This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.

Arcing Classifiers

Two arcing algorithms are explored, they are compared to each other and to bagging, and the definitions of bias and variance for a classifier as components of the test set error are introduced.

Boosting the margin: A new explanation for the effectiveness of voting methods

It is shown that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error.

Bagging, Boosting, and C4.5

Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit.

Boosting and Naive Bayesian learning

It is shown that boosting applied to naive Bayesian classifiers yields combination classifiers that are representationally equivalent to standard feedforward multilayer perceptrons, which are highly plausible computationally as models of animal learning.

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.

On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

  • J. Friedman
  • Mathematics
    Data Mining and Knowledge Discovery
  • 2004
This work candramatically mitigate the effect of the bias associated with some simpleestimators like “naive” Bayes, and the bias induced by the curse-of-dimensionality on nearest-neighbor procedures.

Boosting Decision Trees

A constructive, incremental learning system for regression problems that models data by means of locally linear experts that does not compete for data during learning and derives asymptotic results for this method.

A decision-theoretic generalization of on-line learning and an application to boosting

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

Arcing the edge

A framework for understanding arcing algorithms is defined and a relation is derived between the optimal reduction in the maximum value of the edge and the PAC concept of weak learner.