Better Short than Greedy: Interpretable Models through Optimal Rule Boosting

@article{Boley2021BetterST,
  title={Better Short than Greedy: Interpretable Models through Optimal Rule Boosting},
  author={Mario Boley and Simon Teshuva and Pierre Le Bodic and Geoffrey I. Webb},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.08380}
}
Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability. However, the myopic and random search components of current rule ensemble methods can compromise this goal: they often need more rules than necessary to reach a certain accuracy level or can even outright fail to accurately model a distribution that can actually be described well with a few rules. Here, we present a novel approach aiming to fit rule ensembles of maximal predictive… 

Figures and Tables from this paper

Robust subgroup discovery
TLDR
SSD++ is proposed, a greedy heuristic that builds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, and empirically shows that SSD++ outperforms previous subgroup discovery methods in terms of quality, generalisation on unseen data, and subgroup list size.

References

SHOWING 1-10 OF 23 REFERENCES
Diverse Rule Sets
TLDR
This work proposes a novel approach of inferring diverse rule sets, by optimizing small overlap among decision rules with a 2-approximation guarantee under the framework of Max-Sum diversification, and designs an efficient randomized algorithm, which samples rules that are highly discriminative and have small overlap.
ENDER: a statistical framework for boosting decision rules
TLDR
A learning algorithm, called ENDER, which constructs an ensemble of decision rules, which is tailored for regression and binary classification problems and uses the boosting approach for learning, which can be treated as generalization of sequential covering.
Interpretable Decision Sets: A Joint Framework for Description and Prediction
TLDR
This work proposes interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable, and provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency.
A simple, fast, and effective rule learner
We describe SLIPPER, a new rule learner that generates rulesets by repeatedly boosting a simple, greedy, rule-builder. Like the rulesets built by other rule learners, the ensemble of rules created by
Learning Certifiably Optimal Rule Lists for Categorical Data
TLDR
The results indicate that it is possible to construct optimal sparse rule lists that are approximately as accurate as the COMPAS proprietary risk prediction tool on data from Broward County, Florida, but that are completely interpretable.
Greedy function approximation: A gradient boosting machine.
TLDR
A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
PREDICTIVE LEARNING VIA RULE ENSEMBLES
General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements
Maximum likelihood rule ensembles
TLDR
A new rule induction algorithm for solving classification problems via probability estimation in which a single decision rule is treated as a base classifier in an ensemble which results in estimating the class conditional probability distribution.
Tight Optimistic Estimates for Fast Subgroup Discovery
TLDR
This paper shows that optimistic estimate pruning can be developed into a sound and highly effective pruning approach for subgroup discovery and presents tight optimistic estimates for the most popular binary and multi-class quality functions, and presents a family of increasingly efficient approximations to these optimal functions.
XGBoost: A Scalable Tree Boosting System
TLDR
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
...
...