Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
@article{Breiman2001StatisticalMT, title={Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)}, author={L. Breiman}, journal={Statistical Science}, year={2001}, volume={16}, pages={199-231} }
2,495 Citations
From data to decisions through new interfaces between optimization and statistics
- Computer Science
- 2015
This thesis investigates new modes of data-driven decision making, enabled by novel connections between optimization and statistics, and proposes a novel method for datadriven stochastic optimization that combines finite-sample guarantees with largesample convergence by leveraging new theory linking distributionally-robust optimization and statistical hypothesis testing.
Deep Learning Partial Least Squares
- Computer Science
- 2021
This framework provides a nonlinear extension of PLS together with a disciplined approach to feature selection and architecture design in deep learning, which leads to a statistical interpretation of deep learning that is tailor made for predictive problems.
Supervised Machine Learning Techniques: An Overview with Applications to Banking
- Computer ScienceInternational Statistical Review
- 2021
An application from credit risk modelling in banking is used throughout the paper to illustrate the techniques and interpret the results of the algorithms, and an extensive discussion of hyper‐parameter optimisation techniques are provided.
Modelling the Potential Impacts of Climate Change on Rice Cultivation in Mekong Delta, Vietnam
- Environmental Science
- 2020
Rice paddy fields, considered as a human-made wetland ecosystems, play important roles in food production and ecosystem conservation. Nowadays, rice cultivation in the Mekong Delta, Vietnam, is under…
All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously
- Computer ScienceJ. Mach. Learn. Res.
- 2019
Model class reliance (MCR) is proposed as the range of VI values across all well-performing model in a prespecified class, which gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.
All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance
- Computer Science
- 2018
Model class reliance (MCR) is proposed as the range of VI values across all well-performing model in a prespecified class, which gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.
Supervised Machine Learning for Population Genetics: A New Paradigm
- BiologyTrends in genetics : TIG
- 2018
Student success prediction in MOOCs
- PsychologyUser Modeling and User-Adapted Interaction
- 2018
This article presents a categorization of MOOC research according to the predictors, prediction, and underlying theoretical model, and critically survey work across each category, providing data on the raw data source, feature engineering, statistical model, evaluation method, prediction architecture, and other aspects of these experiments.
Habitat availability and gene flow influence diverging local population trajectories under scenarios of climate change: a place-based approach.
- Environmental ScienceGlobal change biology
- 2016
The potential impact of climate change on the American pika is explored using a replicated place-based approach that incorporates climate, gene flow, habitat configuration, and microhabitat complexity into SDMs and results in diverse and highly divergent future potential occupancy patterns for pikas.
References
SHOWING 1-10 OF 17 REFERENCES
Spline Models for Observational Data
- Mathematics
- 1990
Foreword 1. Background 2. More splines 3. Equivalence and perpendicularity, or, what's so special about splines? 4. Estimating the smoothing parameter 5. 'Confidence intervals' 6. Partial spline…
A Combined Structural and Flexible Functional Approach for Modeling Energy Substitution
- Economics
- 1988
A data-exploration strategy is suggested that combines the parametric family of a system of quadratic log-ratio demand equations and conventional flexible functional forms to uncover the underlying degree of input substitution.
Graphical Methods for Assessing Logistic Regression Models
- Computer Science
- 1984
Modifications and extensions of linear model displays lead to three methods for diagnostic checking of logistic regression models, which are illustrated through the analyses of simulated and real data.
Arcing classifier (with discussion and a rejoinder by the author)
- Computer Science
- 1998
Two arcing algorithms are explored, compared to each other and to bagging, and the definitions of bias and variance for a classifier as components of the test set error are introduced.
Computer-Intensive Methods in Statistics
- Computer Science
- 1983
The bootstrap method is examined and evaluated as an example of this new generation of statistical tools that take advantage of the high speed digital computer and free the statistician to attack more complicated problems.
Cross‐Validatory Choice and Assessment of Statistical Predictions
- Business
- 1974
SUMMARY A generalized form of the cross-validation criterion is applied to the choice and assessment of prediction using the data-analytic concept of a prescription. The examples used to illustrate…