• Corpus ID: 88517270

Making Tree Ensembles Interpretable

@article{Hara2016MakingTE,
  title={Making Tree Ensembles Interpretable},
  author={Satoshi Hara and Kohei Hayashi},
  journal={arXiv: Machine Learning},
  year={2016}
}
Tree ensembles, such as random forest and boosted trees, are renowned for their high prediction performance, whereas their interpretability is critically limited. In this paper, we propose a post processing method that improves the model interpretability of tree ensembles. After learning a complex tree ensembles in a standard way, we approximate it by a simpler model that is interpretable for human. To obtain the simpler model, we derive the EM algorithm minimizing the KL divergence from the… 

Figures and Tables from this paper

Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach
TLDR
This study formalizes the simplification of tree ensembles as a model selection problem and derives a Bayesian model selection algorithm that optimizes the simplified model while maintaining the prediction performance.
Interpreting tree ensembles with inTrees
  • Houtao Deng
  • Computer Science, Environmental Science
    International Journal of Data Science and Analytics
  • 2018
TLDR
This work provides the interpretable trees (inTrees) framework that extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions.
Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
TLDR
In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method.
Scalable Rule-Based Representation Learning for Interpretable Classification
TLDR
Exhaustive experiments show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios.
Tree Ensemble Explainability
TLDR
This work presents a method for deriving instance-level explanations for tree ensemble models and examines its applications, focusing on rule-based systems used in domains where the model is an aid to human decision-making, such as for medical diagnoses.
A Survey on the Explainability of Supervised Machine Learning
TLDR
This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning, and a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions.
Transparent Tree Ensembles
TLDR
This work presents a method for deriving explanations for instance-level decisions in tree ensembles, and opens up the possibility for transparent models at scale.
A survey of methods and tools used for interpreting Random Forest
TLDR
A survey of tools and methods used in literature in order to uncover insights in the RF resulting models, classified depending on different aspects characterizing the interpretability.
Model Bridging: To Interpretable Simulation Model From Neural Network
TLDR
This study investigates a Bayesian neural network model with a few hidden layers serving as an un-explainable machine learning model and proposes a ``model bridging'' framework to bridge machine learning models with simulation models by a series of kernel mean embeddings.
An exact counterfactual-example-based approach to tree-ensemble models interpretability
TLDR
A positive answer is found for any model that enters the category of tree ensemble models, which encompasses a wide range of models dedicated to massive heterogeneous industrial data processing such as XGBoost, Catboost, Lightgbm, random forests, which could derive an exact geometrical characterisation of the decision regions under the form of a collection of multidimensional intervals.
...
...

References

SHOWING 1-10 OF 14 REFERENCES
Interpreting tree ensembles with inTrees
  • Houtao Deng
  • Computer Science, Environmental Science
    International Journal of Data Science and Analytics
  • 2018
TLDR
This work provides the interpretable trees (inTrees) framework that extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions.
Trading Interpretability for Accuracy: Oblique Treed Sparse Additive Models
TLDR
It is demonstrated, on simulation, benchmark, and real world datasets, that OT-SpAMs outperform state-of-the-art interpretable models and perform competitively with kernel SVMs, while still providing results that are highly understandable.
Bayesian Treed Generalized Linear Models
SUMMARY For the standard regression setup, conventional tree models partition the predictor space into regions where the variable of interest Y , can be approximated by a constant. A treed model
Optimal Action Extraction for Random Forests and Boosted Trees
TLDR
The NP-hardness of the optimal action extraction problem for ATMs is proved and this problem is formulated in an integer linear programming formulation which can be efficiently solved by existing packages.
XGBoost: A Scalable Tree Boosting System
TLDR
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Random Forests
TLDR
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Greedy function approximation: A gradient boosting machine.
TLDR
A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
A System for Induction of Oblique Decision Trees
TLDR
This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree.
Web-Search Ranking with Initialized Gradient Boosted Regression Trees
TLDR
This paper investigates Random Forests as a low-cost alternative algorithm to Gradient Boosted Regression Trees (GBRT) (the de facto standard of web-search ranking) and provides an upper bound of the Expected Reciprocal Rank (Chapelle et al., 2009) in terms of classification error.
Hierarchical mixtures of experts and the EM algorithm
  • M. I. Jordan, R. Jacobs
  • Computer Science
    Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan)
  • 1993
TLDR
An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.
...
...