Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

@article{Breiman2001StatisticalMT,
  title={Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)},
  author={L. Breiman},
  journal={Statistical Science},
  year={2001},
  volume={16},
  pages={199-231}
}
  • L. Breiman
  • Published 1 August 2001
  • Computer Science
  • Statistical Science
From data to decisions through new interfaces between optimization and statistics
TLDR
This thesis investigates new modes of data-driven decision making, enabled by novel connections between optimization and statistics, and proposes a novel method for datadriven stochastic optimization that combines finite-sample guarantees with largesample convergence by leveraging new theory linking distributionally-robust optimization and statistical hypothesis testing.
Deep Learning Partial Least Squares
TLDR
This framework provides a nonlinear extension of PLS together with a disciplined approach to feature selection and architecture design in deep learning, which leads to a statistical interpretation of deep learning that is tailor made for predictive problems.
Supervised Machine Learning Techniques: An Overview with Applications to Banking
TLDR
An application from credit risk modelling in banking is used throughout the paper to illustrate the techniques and interpret the results of the algorithms, and an extensive discussion of hyper‐parameter optimisation techniques are provided.
Modelling the Potential Impacts of Climate Change on Rice Cultivation in Mekong Delta, Vietnam
Rice paddy fields, considered as a human-made wetland ecosystems, play important roles in food production and ecosystem conservation. Nowadays, rice cultivation in the Mekong Delta, Vietnam, is under
All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously
TLDR
Model class reliance (MCR) is proposed as the range of VI values across all well-performing model in a prespecified class, which gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.
All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance
TLDR
Model class reliance (MCR) is proposed as the range of VI values across all well-performing model in a prespecified class, which gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.
Supervised Machine Learning for Population Genetics: A New Paradigm
Student success prediction in MOOCs
TLDR
This article presents a categorization of MOOC research according to the predictors, prediction, and underlying theoretical model, and critically survey work across each category, providing data on the raw data source, feature engineering, statistical model, evaluation method, prediction architecture, and other aspects of these experiments.
Habitat availability and gene flow influence diverging local population trajectories under scenarios of climate change: a place-based approach.
TLDR
The potential impact of climate change on the American pika is explored using a replicated place-based approach that incorporates climate, gene flow, habitat configuration, and microhabitat complexity into SDMs and results in diverse and highly divergent future potential occupancy patterns for pikas.
...
...

References

SHOWING 1-10 OF 17 REFERENCES
Spline Models for Observational Data
Foreword 1. Background 2. More splines 3. Equivalence and perpendicularity, or, what's so special about splines? 4. Estimating the smoothing parameter 5. 'Confidence intervals' 6. Partial spline
A Combined Structural and Flexible Functional Approach for Modeling Energy Substitution
TLDR
A data-exploration strategy is suggested that combines the parametric family of a system of quadratic log-ratio demand equations and conventional flexible functional forms to uncover the underlying degree of input substitution.
Graphical Methods for Assessing Logistic Regression Models
TLDR
Modifications and extensions of linear model displays lead to three methods for diagnostic checking of logistic regression models, which are illustrated through the analyses of simulated and real data.
Arcing classifier (with discussion and a rejoinder by the author)
TLDR
Two arcing algorithms are explored, compared to each other and to bagging, and the definitions of bias and variance for a classifier as components of the test set error are introduced.
Computer-Intensive Methods in Statistics
TLDR
The bootstrap method is examined and evaluated as an example of this new generation of statistical tools that take advantage of the high speed digital computer and free the statistician to attack more complicated problems.
Cross‐Validatory Choice and Assessment of Statistical Predictions
SUMMARY A generalized form of the cross-validation criterion is applied to the choice and assessment of prediction using the data-analytic concept of a prescription. The examples used to illustrate
...
...