• Corpus ID: 201070843

SIRUS: making random forests interpretable

@article{Bnard2019SIRUSMR,
  title={SIRUS: making random forests interpretable},
  author={Cl{\'e}ment B{\'e}nard and G{\'e}rard Biau and S{\'e}bastien Da Veiga and Erwan Scornet},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.06852}
}
State-of-the-art learning algorithms, such as random forests or neural networks, are often qualified as "black-boxes" because of the high number and complexity of operations involved in their prediction mechanism. This lack of interpretability is a strong limitation for applications involving critical decisions, typically the analysis of production processes in the manufacturing industry. In such critical contexts, models have to be interpretable, i.e., simple, stable, and predictive. To… 

Interpretable Random Forests via Rule Extraction

This work introduces SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm which takes the form of a short and simple list of rules which combines a simple structure with a remarkable stable behavior when data is perturbed.

Visualisation and knowledge discovery from interpretable models

The newly developed classifiers helped in investigating the complexities of the UCI dataset as a multiclass problem and were comparable to those reported in literature for this dataset, with additional value of interpretability when the dataset was treated as a binary class problem.

A rigorous method to compare interpretability

The aim of this article is to propose a rigorous mathematical definition of the concept of interpretability, allowing fair comparisons between any rule-based algorithms, built from three notions, each being quantitatively measured by a simple formula: predictivity, stability and simplicity.

MP-Boost: Minipatch Boosting via Adaptive Feature and Observation Sampling

Boosting methods are among the best generalpurpose and off-the-shelf machine learning approaches, gaining widespread popularity. In this paper, we seek to develop a boosting method that yields

Robust and Heterogenous Odds Ratio: Estimating Price Sensitivity for Unbought Items

  • J. Pauphilet
  • Economics
    Manufacturing & Service Operations Management
  • 2022
Problem definition: Mining for heterogeneous responses to an intervention is a crucial step for data-driven operations, for instance, to personalize treatment or pricing. We investigate how to

Predicting Cell-Penetrating Peptides: Building and Interpreting Random Forest based prediction Models

This work builds prediction models for CPPs exploring features covering a range of properties based on amino acid sequences, using Random forest classifiers which are often more interpretable than other ensemble machine learning algorithms.

A framework for the risk prediction of avian influenza occurrence: An Indonesian case study

A framework for the prediction of the occurrence and spread of avian influenza events in a geographical area is proposed and suggested that the proposed framework could act as a tool to gain a broad understanding of the drivers ofAvian influenza epidemics and may facilitate the Prediction of future disease events.

References

SHOWING 1-10 OF 57 REFERENCES

SIRUS: Stable and Interpretable RUle Set for classification

SIRUS (Stable and Interpretable RUle Set), a new classification algorithm based on random forests, which takes the form of a short list of rules, achieves a remarkable stability improvement over cutting-edge methods.

ENDER: a statistical framework for boosting decision rules

A learning algorithm, called ENDER, which constructs an ensemble of decision rules, which is tailored for regression and binary classification problems and uses the boosting approach for learning, which can be treated as generalization of sequential covering.

Interpretable Decision Sets: A Joint Framework for Description and Prediction

This work proposes interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable, and provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency.

Generating Accurate Rule Sets Without Global Optimization

This paper presents an algorithm for inferring rules by repeatedly generating partial decision trees, thus combining the two major paradigms for rule generation—creating rules from decision trees and the separate-and-conquer rule-learning technique.

Node harvest

When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy.

Definitions, methods, and applications in interpretable machine learning

This work defines interpretability in the context of machine learning and introduces the predictive, descriptive, relevant (PDR) framework for discussing interpretations, and introduces 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy.

Random Forests

Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

A generative model called Bayesian Rule Lists is introduced that yields a posterior distribution over possible decision lists that employs a novel prior structure to encourage sparsity and has predictive accuracy on par with the current top algorithms for prediction in machine learning.

Interpretable machine learning: definitions, methods, and applications

This paper first defines interpretability in the context of machine learning and place it within a generic data science life cycle, and introduces the Predictive, Descriptive, Relevant (PDR) framework, consisting of three desiderata for evaluating and constructing interpretations.

PREDICTIVE LEARNING VIA RULE ENSEMBLES

General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements
...