• Corpus ID: 239024639

A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds

@article{Tan2021ACT,
  title={A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds},
  author={Yan Shuo Tan and Abhineet Agarwal and Bin Yu},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.09626}
}
Decision trees are important both as interpretable models amenable to high-stakes decisionmaking, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with… 

Figures from this paper

References

SHOWING 1-10 OF 57 REFERENCES
Consistency of Random Forests
Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5–32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its
Strong Optimal Classification Trees
TLDR
This paper proposes an intuitive flow-based MIO formulation that can accommodate side constraints to enable the design of interpretable and fair decision trees and shows that this formulation has a stronger linear optimization relaxation than existing methods.
Local Linear Forests
TLDR
A central limit theorem valid under regularity conditions on the forest and smoothness constraints is proved, a computationally efficient construction for confidence intervals is proposed, and a causal inference application is discussed.
Do we need hundreds of classifiers to solve real world classification problems?
TLDR
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
TLDR
A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning.
Greedy function approximation: A gradient boosting machine.
Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions
Definitions, methods, and applications in interpretable machine learning
TLDR
This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy.
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
  • Stefan Wager, S. Athey
  • Computer Science, Mathematics
    Journal of the American Statistical Association
  • 2018
TLDR
This is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference and is found to be substantially more powerful than classical methods based on nearest-neighbor matching.
Linear Aggregation in Tree-based Estimators
TLDR
A new algorithm is introduced which finds the best axis-aligned split to fit optimal linear aggregation functions on the corresponding nodes and implement this method in the provably fastest way, enabling to create more interpretable trees and obtain better predictive performance on a wide range of data sets.
Supervised Neighborhoods for Distributed Nonparametric Regression
TLDR
A new method, Silo, is proposed for fitting predictiontime local models that uses supervised neighborhoods that adapt to the local shape of the regression surface, and works well in both the serial and distributed settings.
...
1
2
3
4
5
...