Fitting Prediction Rule Ensembles with R Package pre

@article{Fokkema2020FittingPR,
  title={Fitting Prediction Rule Ensembles with R Package pre},
  author={Marjolein Fokkema},
  journal={Journal of Statistical Software},
  year={2020}
}
  • M. Fokkema
  • Published 22 July 2017
  • Computer Science
  • Journal of Statistical Software
Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. [] Key Result Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction.

Figures and Tables from this paper

Improved prediction rule ensembling through model-based data generation
TLDR
The use of surrogacy models can substantially improve the sparsity of PRE, while retaining predictive accuracy, especially through the use of a nested surrogacy approach.
Fitting prediction rule ensembles to psychological research data: An introduction and tutorial.
TLDR
The methodology is introduced and how PREs can be fitted using the R package pre is shown through several real-data examples from psychological research, illustrating a number of features of package pre that may be particularly useful for applications in psychology.
Linear Aggregation in Tree-based Estimators
TLDR
A new algorithm is introduced which finds the best axis-aligned split to fit optimal linear aggregation functions on the corresponding nodes and implement this method in the provably fastest way, enabling to create more interpretable trees and obtain better predictive performance on a wide range of data sets.
SIRUS: making random forests interpretable
TLDR
SIRUS (Stable and Interpretable RUle Set), a new classification algorithm based on random forests, which takes the form of a short list of rules, achieves a remarkable stability improvement over cutting-edge methods.
GPSRL: Learning Semi-Parametric Bayesian Survival Rule Lists from Heterogeneous Patient Data
TLDR
This paper proposes a new semi-parametric Bayesian Survival Rule List model, which derives a rule-based decision-making approach, while within the regime defined by each rule, survival risk is modelled via a Gaussian process latent variable model.
Learning Interpretable Rules Contributing to Maximal Fuel Rate Flow Consumption in an Aircraft using Rule Based Algorithms
TLDR
The main aim of this paper was to extract interpretable and visually justifiable rules for the fuel intake in each phase of flight using a new rule based algorithm called Generalized Linear Rules Model, which is still under research in the machine learning space.
Understanding the complexity of sepsis mortality prediction via rule discovery and analysis: a pilot study
TLDR
Glasgow Coma Scale, serum potassium, and serum bilirubin are found to be the most important risk factors for predicting patient death in sepsis patients.
Melancholia defined with the precision of a machine.
...
1
2
...

References

SHOWING 1-10 OF 39 REFERENCES
PREDICTIVE LEARNING VIA RULE ENSEMBLES
General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements
MODIFIED RULE ENSEMBLE METHOD FOR BINARY DATA AND ITS APPLICATIONS
TLDR
This study solved the excess pruning problem by constructing RuleFit within a logistic regression framework, weighting the base learners by elastic net, and demonstrated higher predictive performance than the original RuleFit model.
Solving Regression by Learning an Ensemble of Decision Rules
TLDR
A novel decision rule induction algorithm for solving the regression problem and the prediction model in the form of an ensemble of decision rules is powerful, which is shown by results of the experiment presented in the paper.
Node harvest
When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy.
An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
TLDR
The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application.
ENDER: a statistical framework for boosting decision rules
TLDR
A learning algorithm, called ENDER, which constructs an ensemble of decision rules, which is tailored for regression and binary classification problems and uses the boosting approach for learning, which can be treated as generalization of sequential covering.
Generating Rule Sets from Model Trees
TLDR
This paper presents an algorithm for inducing simple, accurate decision lists from model trees and shows that this method produces comparably accurate and smaller rule sets than the commercial state-of-the-art rule learning system Cubist.
Greedy function approximation: A gradient boosting machine.
TLDR
A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Benchmarking Open-Source Tree Learners in R/RWeka
TLDR
Both classification tree algorithms are found to be competitive in terms of misclassification error—with the performance difference clearly varying across data sets, however, C4.5 tends to grow larger and thus more complex trees.
Classification and Regression by randomForest
TLDR
random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.
...
1
2
3
4
...