Extremely randomized trees

  title={Extremely randomized trees},
  author={Pierre Geurts and Damien Ernst and Louis Wehenkel},
  journal={Machine Learning},
This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the… 

Improvement of randomized ensembles of trees for supervised learning in very high dimension

Empirical experiments show that the combination of the monotone LASSO with features extracted from tree ensembles leads at the same time to a drastic reduction of the number of features and can improve the accuracy with respect to unpruned ensembleles of trees.

Learning with Ensembles of Randomized Trees : New Insights

A connection with kernel target alignment, a measure of kernel quality, is pointed out, which suggests that randomization is a way to obtain a high alignment, leading to possibly low generalization error.

Towards generating random forests via extremely randomized trees

The results on several public datasets show that random partition without exhaustive search at each node of a decision tree can yield better performance with less computational complexity.

Influence of Hyperparameters on Random Forest Accuracy

This work evaluates the Forest-RI algorithm on several machine learning problems and with different settings of K in order to understand the way it acts on RF performance, and shows that default values of K traditionally used in the literature are globally near-optimal, except for some cases for which they are all significatively sub-optical.

Random Forests with Stochastic Induction of Decision Trees

The proposed algorithm is based on a stochastic process to induct each decision tree, assigning a probability for the selection of the split attribute in every tree node, designed in order to create strong and independent trees.

Embedding Monte Carlo Search of Features in Tree-Based Ensemble Methods

A general scheme to embed feature generation in a wide range of tree-based learning algorithms, including single decision trees, random forests and tree boosting, based on the formalization of feature construction as a sequential decision making problem addressed by a tractable Monte Carlo search algorithm coupled with node splitting is proposed.


This paper proves a consistency result for Breiman’s original algorithm in the context of additive regression models and sheds an interesting light on how random forests can nicely adapt to sparsity.

AMF: Aggregated Mondrian forests for online learning

AMF, an online RF algorithm based on Mondrian Forests is introduced, using a variant of the context tree weighting algorithm to obtain a truly online parameter‐free algorithm which is competitive with the optimal pruning of the Mondrian tree, and thus adaptive to the unknown regularity of the regression function.

An Empirical Comparison of Supervised Ensemble Learning Approaches

We present an extensive empirical comparison between twenty prototypical supervised ensemble learning algorithms, including Boosting, Bagging, Random Forests, Rotation Forests, Arc-X4,

An extensive empirical comparison of ensemble learning methods for binary classification

We present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, including Boosting, Bagging, Random Forests, Rotation Forests, Arc-X4,



The Random Subspace Method for Constructing Decision Forests

  • T. Ho
  • Computer Science
    IEEE Trans. Pattern Anal. Mach. Intell.
  • 1998
A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.

An Empirical Comparison of Selection Measures for Decision-Tree Induction

The paper considers a number of different measures and experimentally examines their behavior in four domains and shows that the choice of measure affects the size of a tree but not its accuracy, which remains the same even when attributes are selected randomly.

Approximate Splitting for Ensembles of Trees using Histograms

The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram.

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting, and sometimes better than randomization.

A further comparison of splitting rules for decision-tree induction

The results indicate that random splitting leads to increased error and are at variance with those presented by Mingers.

A Further Comparison of Splitting Rules for Decision-Tree Induction

The results indicate that random splitting leads to increased error and are at variance with those presented by Mingers.

Random Forests

Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.


It is shown that the simplest kind of trees are complete in D-dimensional space if the number of terminal nodes T is greater than D and that the Adaboost minimization algorithm gives an ensemble converging to the Bayes risk.

PERT – Perfect Random Tree Ensembles

This work introduces a new ensemble method, PERT, in which each individual classifier is a perfectly-fit classification tree with random selection of splits, and shows that PERT is fitting a continuous posterior probability surface for each class.

An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference.