Generating Compact Tree Ensembles via Annealing

  title={Generating Compact Tree Ensembles via Annealing},
  author={Gitesh Dawer and Yangzi Guo and Adrian Barbu},
  journal={2020 International Joint Conference on Neural Networks (IJCNN)},
Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more… 

Figures and Tables from this paper



Compressing Random Forests

This work introduces a novel method for lossless compression of tree-based ensemble methods, focusing on Random Forests, based on probabilistic modeling of the ensemble's trees, followed by model clustering via Bregman divergence.

Random Forests

Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

Some Enhencements of Decision Tree Bagging

A very simple discretization procedure is proposed, resulting in a dramatic speedup without significant decrease in accuracy, and a new method is proposed to prune an ensemble of trees in a combined fashion, which is significantly more effective than individual pruning.

Global refinement of random forest

The proposed global refinement jointly relearns the leaf nodes of all trees under a global objective function so that the complementary information between multiple trees is well exploited and the fitting power of the forest is significantly enhanced.

Special Invited Paper-Additive logistic regression: A statistical view of boosting

This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.


General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements

Greedy function approximation: A gradient boosting machine.

A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

A decision-theoretic generalization of on-line learning and an application to boosting

The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

L1-based compression of random forest models

It is shown experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible and a L1-based regu- larization to the set of indicator functions defined by all their nodes is proposed.

On Model Selection Consistency of Lasso

It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.