An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

@article{Dietterich2004AnEC,
  title={An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization},
  author={Thomas G. Dietterich},
  journal={Machine Learning},
  year={2004},
  volume={40},
  pages={139-157}
}
Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. [...] Key Result In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.Expand
A Comparison of Decision Tree Ensemble Creation Techniques
TLDR
An algorithm is introduced that decides when a sufficient number of classifiers has been created for an ensemble, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble. Expand
A Comparison of Decision Tree Ensemble Creation Techniques
TLDR
An algorithm is introduced that decides when a sufficient number of classifiers has been created for an ensemble, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble. Expand
A Heuristically Perturbation of Dataset to Achieve a Diverse Ensemble of Classifiers
TLDR
It is proposed to show that CDEBMTE can be effectively used to achieve higher accuracy and to obtain better class membership probability estimates. Expand
Decision Tree Simplification For Classifier Ensembles
TLDR
The performance of Bagging, Boosting and Error-Correcting Output Code (ECOC) is compared for five decision tree pruning methods and the influence of pruning on the performance of the ensembles is studied. Expand
A Comparison of Ensemble Creation Techniques
TLDR
Bagging and six other randomization-based ensemble tree methods are evaluated and it is found that none of them is consistently more accurate than standard bagging when tested for statistical significance. Expand
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
-Classification is one of the data mining techniques that analyses a given data set and induces a model for each class based on their features present in the data. Bagging and boosting are heuristicExpand
Construction of High-accuracy Ensemble of Classifiers
TLDR
The proposed approach uses Bagging and Boosting as the generators of base classifiers and applies the selection of the most accurate classifiers from clusters and employing Bagging generator to construct the final ensemble. Expand
The role of margins in boosting and ensemble performance
TLDR
The role of margins is examined in boosting and ensemble method performance, in most instances having lower generalization error than other competing ensemble methodologies, such as bagging and random forests. Expand
Creating diverse ensemble classifiers to reduce supervision
TLDR
This thesis introduces a novel approach to active learning based on ACTIVEDECORATE which uses Jensen-Shannon divergence (a similarity measure for probability distributions) to improve the selection of training examples for optimizing probability estimation. Expand
An empirical comparison of ensemble methods based on classification trees
TLDR
An empirical comparison of the classification error of several ensemble methods based on classification trees is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih in 2000. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
An Empirical Evaluation of Bagging and Boosting
TLDR
The results clearly show that even though Bagging almost always produces a better classifier than any of its individual component classifiers and is relatively impervious to overfitting, it does not generalize any better than a baseline neural-network ensemble method. Expand
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
TLDR
It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Expand
Bagging, Boosting, and C4.5
TLDR
Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit. Expand
Experiments with a New Boosting Algorithm
TLDR
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand
Bias, Variance , And Arcing Classifiers
TLDR
This work explores two arcing algorithms, compares them to each other and to bagging, and tries to understand how arcing works, which is more sucessful than bagging in variance reduction. Expand
Bagging predictors
TLDR
Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. Expand
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
TLDR
This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task and measures the power (ability to detect algorithm differences when they do exist) of these tests. Expand
A Comparison of Methods for Learning and Combining Evidence From Multiple Models
TLDR
This work presents a Parsion of three ways of learning multiple models on 29 data sets from the UCI repository, and compares four evidence combination methods and characterize the kinds of data sets for which each method works best. Expand
Option Decision Trees with Majority Votes
TLDR
The goal was to explore when option nodes are most useful and to control the growth of the trees so that additional complexity of little utility is limited and the results show that for the tested problems, the reduction in error rates can be achieved. Expand
Data Mining Using MLC a Machine Learning Library in C++
TLDR
A system called MLC++ is described, which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. Expand
...
1
2
...