• Corpus ID: 246473164

Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods

  title={Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods},
  author={Abhineet Agarwal and Yan Shuo Tan and Omer Ronen and Chandan Singh and Bin Yu},
Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice. To mitigate overfitting, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking the prediction over each node towards the sample means of its ancestors. The amount of shrinkage is… 
Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data
An instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model that achieves state-of-the-art prediction performance on important clinical datasets.
Predictability and Stability Testing to Assess Clinical Decision Instrument Performance for Children After Blunt Torso Trauma
The PCS data science framework vetted the PECARN CDI and its constituent predictor variables prior to external validation, suggesting that both CDIs will generalize well to new populations, offering a potential strategy to increase the chance of a successful external validation.


Fast Interpretable Greedy-Tree Sums (FIGS)
FIGS generalizes the CART algorithm to simultaneously grow a flexible number of trees in a summation, and is able to avoid repeated splits, and often provides more concise decision rules than fitted decision trees, without sacrificing predictive performance.
Universal Consistency of Decision Trees in High Dimensions
This paper shows that decision trees constructed with Classification and Regression Trees (CART) methodology are universally consistent in an additive model context, even when the number of predictor
Random Forests
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Generalized and Scalable Optimal Sparse Decision Trees
The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables.
A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds
A sharp squared error generalization lower bound is proved for a large class of decision tree algorithmstted to sparse additive models with C 1 component functions, and a novel connection between decision tree estimation and rate-distortion theory, a sub-field of information theory is established.
Bayesian Additive Regression Trees
We develop a Bayesian \sum-of-trees" model where each tree is constrained by a regularization prior to be a weak learner, and fltting and inference are accomplished via an iterative Bayesian
Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success
It is demonstrated that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings.
Do we need hundreds of classifiers to solve real world classification problems?
The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Classification and regression trees
  • W. Loh
  • Computer Science
    WIREs Data Mining Knowl. Discov.
  • 2011
This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning.