Corpus ID: 5956609

Statistical modeling: The two cultures

  title={Statistical modeling: The two cultures},
  author={Leo Breiman},
  journal={Quality Engineering},
  • L. Breiman
  • Published 2001
  • Computer Science
  • Quality Engineering
There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated bya given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical communityhas been committed to the almost exclusive use of data models. This commit- ment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current prob- lems… Expand

Figures and Tables from this paper

Statistical Inference After Model Selection
This paper examines a variety of model selection procedures routinely undertaken followed by statistical tests and confidence intervals computed for a “final” model in criminology and shows how they are typically misguided. Expand
Big Data is not only about data: The two cultures of modelling
A brief discussion of model-based recursive partitioning which can bridge the theory and data-driven approach to statistical modelling and is an example of how this new approach can help revise models that work for the full dataset. Expand
Discussion Paper
The views expressed in this paper are those of the author(s) and do not necessarily reflect the policies of Statistics Netherlands Data sources referred to as Big data become available for use byExpand
6-2010 Statistical Inference After Model Selection
Conventional statistical inference requires that a model of how the data were generated be known before the data are analyzed. Yet in criminology, and in the social sciences more broadly, a varietyExpand
Comment on "Statistical Modeling: The Two Cultures" by Leo Breiman
Motivated by Breiman’s rousing 2001 paper on the “two cultures” in statistics, we consider the role that different modeling approaches play in causal inference. We discuss the relationship betweenExpand
Distributional Trees and Forests
Obtaining valuable information from given data requires the use of appropriate methods of analysis. For example, if a certain variable of interest is assumed to depend on a (set of) covariate(s),Expand
A problem-solving approach to data analysis for economics
Data analysis for formal methods is constrained due to the lengthy dominance of the econometric view within economics. Best practice in statistics suggests a shift in emphasis from making statementsExpand
Big data and its epistemology
Whether Big Data, in the form of data‐driven science, will enable the discovery, or appraisal, of universal scientific theories, instrumentalist tools, or inductive inferences is considered. Expand
It takes two to tango: Statistical modeling and machine learning
A scenario is created where it shows that when the learning from using a statistical method and applying it to machine learning, the ultimate benefit can be greater than the sum of each method’s benefits. Expand
The Causal Nature of Modeling with Big Data
It is shown to lack a pronounced hierarchical, nested structure and the significance of the transition to such “horizontal” modeling is underlined by the concurrent emergence of novel inductive methodology in statistics such as non-parametric statistics. Expand


Statistical models and shoe leather
A bstract . Regression models have been used in the social sciences at least since 1899, when Yule published a paper on the causes of pauperism. Regression models are now used to make causalExpand
Computer-Intensive Methods in Statistics
In the past few years there has been a surge in the development of new statistical theories and methods that take advantage of the high speed digital computer. The payoff for such intensiveExpand
From Association to Causation via Regression
For nearly a century, investigators in the social sciences have used regression models to deduce cause-and-effect relationships from patterns of association. Path models and automated searchExpand
Discussion of David Freedman’s “Some Issues in the Foundations of Statistics”
While results from statistical modelling too often receive blind acceptance, we question whether there is any real alternative to use of modelling. This does not diminish the main point of ProfessorExpand
The problem of regions
In the problem of regions, we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptiveExpand
Computer Intensive Methods in Statistics
Four topics that have been treated in more detail were: Bayesian Computing; Interfacing Statistics and Computers; Image Analysis; Resampling Methods. Expand
Nonparametric Statistical Data Modeling
Abstract This article attempts to describe an approach to statistical data analysis which is simultaneously parametric and nonparametric. Given a random sample X 1, …, X n of a random variable X, oneExpand
The 1991 Census Adjustment: Undercount or Bad Data?
Careful scrutiny of these studies together with auxiliary sources of information provided by the Census Bureau are used to examine the issue of whether the data gathered in the Post Enumeration Survey can provide reliable undercount estimates. Expand
Graphical Methods for Assessing Logistic Regression Models
Abstract In ordinary linear regression, graphical diagnostic displays can be very useful for detecting and examining anomalous features in the fit of a model to data. For logistic regression models,Expand
Scientific Method, Statistical Method and the Speed of Light
A history on the speed of light up to the time of Michelson's study is presented and the details of a single study allow to place the method of statistics within the larger context of science. Expand