• Corpus ID: 239024639

A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

  title={A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds},
  author={Yan Shuo Tan and Abhineet Agarwal and Bin Yu},
Decision trees are important both as interpretable models amenable to high-stakes decisionmaking, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with… 

Figures from this paper

Fast Interpretable Greedy-Tree Sums (FIGS)
FIGS generalizes the CART algorithm to simultaneously grow a flexible number of trees in a summation, and is able to avoid repeated splits, and often provides more concise decision rules than fitted decision trees, without sacrificing predictive performance.
Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods
Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking the prediction over each node towards the sample means of its ancestors, is introduced.


Consistency of Random Forests
A step forward in forest exploration is taken by proving a consistency result for Breiman's original algorithm in the context of additive regression models, and sheds an interesting light on how random forests can nicely adapt to sparsity.
Strong Optimal Classification Trees
This paper proposes an intuitive flow-based MIO formulation that can accommodate side constraints to enable the design of interpretable and fair decision trees and shows that this formulation has a stronger linear optimization relaxation than existing methods.
Local Linear Forests
A central limit theorem valid under regularity conditions on the forest and smoothness constraints is proved, a computationally efficient construction for confidence intervals is proposed, and a causal inference application is discussed.
Do we need hundreds of classifiers to solve real world classification problems?
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning.
Greedy function approximation: A gradient boosting machine.
A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Definitions, methods, and applications in interpretable machine learning
This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy.
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
  • Stefan Wager, S. Athey
  • Mathematics, Computer Science
    Journal of the American Statistical Association
  • 2018
This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.
Linear Aggregation in Tree-based Estimators
A new algorithm is introduced which finds the best axis-aligned split to fit optimal linear aggregation functions on the corresponding nodes and implement this method in the provably fastest way, enabling to create more interpretable trees and obtain better predictive performance on a wide range of data sets.
Supervised Neighborhoods for Distributed Nonparametric Regression
A new method, Silo, is proposed for fitting predictiontime local models that uses supervised neighborhoods that adapt to the local shape of the regression surface, and works well in both the serial and distributed settings.