Random Forests

@article{Breiman2004RandomF,
  title={Random Forests},
  author={Leo Breiman},
  journal={Machine Learning},
  year={2004},
  volume={45},
  pages={5-32}
}
  • L. Breiman
  • Published 2004
  • Mathematics, Computer Science
  • Machine Learning
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features… Expand
Ensemble of optimal trees, random forest and random projection ensemble classification
TLDR
This work investigates the idea of integrating trees that are accurate and diverse and uses out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and assess these trees for diversity using the Brier score on an independent validation sample. Expand
Double random forest
TLDR
To produce bigger trees than those by RF, a new classification ensemble method called double random forest (DRF) is proposed, which uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. Expand
An Ensemble of Optimal Trees for Classification and Regression (OTE)
TLDR
This work investigates the idea of integrating trees that are accurate and diverse and utilizes out-of-bag observation as validation sample from the training bootstrap samples to choose the best trees based on their individual performance and then assess these trees for diversity using Brier score. Expand
Improvement of randomized ensembles of trees for supervised learning in very high dimension
Tree-based ensemble methods, such as random forests and extremely randomized trees, are methods of choice for handling high dimensional problems. One important drawback of these methods however isExpand
On the selection of decision trees in Random Forests
TLDR
It is shown that better subsets of decision trees can be obtained even using a sub-optimal classifier selection method, which proves that “classical” RF induction process, for which randomized trees are arbitrary added to the ensemble, is not the best approach to produce accurate RF classifiers. Expand
Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data
TLDR
This paper presents a new approach to solve the problem of noisy trees in random forest through weighting the trees according to their classification ability, named Trees Weighting Random Forest (TWRF). Expand
Random Forests
Tree-based classification methods are widely applied tools in machine learning that create a taxonomy of the space of objects to classify. Nonetheless, their workings are not documented to highExpand
Classification and Regression by randomForest
TLDR
random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler. Expand
Random Forests and Decision Trees Classifiers : Effects of Data Quality on the Learning Curve
TLDR
It appeared that random forests and individual decision trees have different sensitivities to those perturbation factors, but counterintuitively random forests show a greater sensitivity to noise than decision trees for this parameter. Expand
Is rotation forest the best classifier for problems with continuous features?
TLDR
It is demonstrated that on large problems rotation forest can be made an order of magnitude faster without significant loss of accuracy, and it is maintained that without any domain knowledge to indicate an algorithm preference, rotation forest should be the default algorithm of choice for problems with continuous attributes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
The Random Subspace Method for Constructing Decision Forests
  • T. Ho
  • Mathematics, Computer Science
  • IEEE Trans. Pattern Anal. Mach. Intell.
  • 1998
TLDR
A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. Expand
SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES
To dispel some of the mystery about what makes tree ensembles work, they are looked at in distribution space i.e. the limit case of "infinite" sample size. It is shown that the simplest kind of treesExpand
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
TLDR
It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Expand
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization
TLDR
The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting, and sometimes better than randomization. Expand
On the Algorithmic Implementation of Stochastic Discrimination
  • E. Kleinberg
  • Computer Science
  • IEEE Trans. Pattern Anal. Mach. Intell.
  • 2000
TLDR
An outline of the underlying mathematical theory of stochastic discrimination is outlined and a remark concerning boosting is made, which provides a theoretical justification for properties of that method observed in practice, including its ability to generalize. Expand
Boosting the margin: A new explanation for the effectiveness of voting methods
TLDR
It is shown that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. Expand
Shape Quantization and Recognition with Randomized Trees
TLDR
A new approach to shape recognition based on a virtually infinite family of binary features (queries) of the image data, designed to accommodate prior information about shape invariance and regularity, and a comparison with artificial neural networks methods is presented. Expand
An Efficient Method To Estimate Bagging's Generalization Error
TLDR
This paper presents several techniques for estimating the generalization error of a bagged learning algorithm without invoking yet more training of the underlying learning algorithm (beyond that of the bagging itself), as is required by cross-validation-based estimation. Expand
Arcing classifier (with discussion and a rejoinder by the author)
Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging. Here, modifiedExpand
Experiments with a New Boosting Algorithm
TLDR
This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers. Expand
...
1
2
3
...