To Explain or to Predict

  title={To Explain or to Predict},
  author={Galit Shmueli},
  journal={arXiv: Methodology},
  • Galit Shmueli
  • Published 5 January 2011
  • Computer Science, Mathematics
  • arXiv: Methodology
Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been… Expand

Figures from this paper

Explanatory Versus Predictive Modeling
  • Kristin L. Sainani
  • Medicine
  • PM & R : the journal of injury, function, and rehabilitation
  • 2014
The differences between explanatory and predictive modeling are reviewed, which affects every aspect of model building and evaluation. Expand
What Can We Learn from Predictive Modeling?
The central benefits of predictive modeling are reviewed from a perspective uncommon in the existing literature: it is focused on how predictive modeling can be used to complement and augment standard associational analyses. Expand
A Unified Statistical Framework for Evaluating Predictive Methods
A unified statistical framework is presented for evaluating predictive methods that can be applied to most problems and datasets and has the theoretical advantage that it is not necessary to assume a normal distribution. Expand
Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning
  • T. Yarkoni, Jacob Westfall
  • Psychology, Medicine
  • Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2017
It is proposed that principles and techniques from the field of machine learning can help psychology become a more predictive science and an increased focus on prediction, rather than explanation, can ultimately lead to greater understanding of behavior. Expand
A note on the interpretation of tree-based regression models.
If the generating model contains chains of direct and indirect effects, then the typical variable importance measures suggest selecting as important mainly the background variables, which have a strong indirect effect, disregarding the variables that directly influence the response. Expand
Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference
It is shown that model averaging is particularly useful if the predictive error of contributing model predictions is dominated by variance, and if the covariance between models is low, and for noisy data, which predominate in ecology, these conditions will often be met. Expand
A Model Must Be Wrong to be Useful: The Role of Linear Modeling and False Assumptions in Theoretical Explanation~!2010-01-05~!2010-04-18~!2010-07-21~!
It is true that many times relationships in the real world do not fall into a linear pattern. Nevertheless, even if the true causal structure of the phenomenon under study is not linear, it does notExpand
Enhancing Validity in Observational Settings When Replication Is Not Possible
We argue that political scientists can provide additional evidence for the predictive validity of observational and quasi-experimental research designs by minimizing the expected prediction error orExpand
The Need for More Emphasis on Prediction: A “Nondenominational” Model-Based Approach
It is argued that the performance of a prediction procedure in repeated application is important and should play a significant role in its evaluation. Expand
Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning
Five primary ways in which generalized linear models for prediction differ from GLMs for causal inference are identified, which will help ensure that both prediction and causal modelling are used appropriately and to greatest effect in health research. Expand


To Explain or To Predict?
The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the model- ing process. Expand
Causation, Prediction, and Search, 2nd Edition
What assumptions and methods allow us to turn observations into causal knowledge, and how can even incomplete causal knowledge be used in planning and prediction to influence and control ourExpand
Prediction Versus Accommodation and the Risk of Overfitting
A new approach to the vexed problem of understanding the epistemic difference between prediction and accommodation is presented, floating the hypothesis that accommodation is a defective methodology only when the methods used to accommodate the data fail to guard against the risk of overfitting. Expand
Instrumentalism, Parsimony, and the Akaike Framework
  • E. Sober
  • Mathematics
  • Philosophy of Science
  • 2002
Akaike’s framework for thinking about model selection in terms of the goal of predictive accuracy and his criterion for model selection have important philosophical implications. Scientists oftenExpand
The art of causal conjecture
The Art of Causal Conjecture shows that causal ideas can be equally important in theory and by bringing causal ideas into the foundations of probability allows causal conjectures to be more clearly quantified, debated, and confronted by statistical evidence. Expand
Bayes model averaging with selection of regressors
When a number of distinct models contend for use in prediction, the choice of a single model can offer rather unstable predictions. In regression, stochastic search variable selection with BayesianExpand
The Hierarchy Principle in Designed Industrial Experiments
The general question of appropriate criteria for the development of models from a designed experiment is considered, and the broader research question revolves around the choice of criteria for model building, variable selection and model discrimination. Expand
Studies in the Logic of Explanation
To explain the phenomena in the world of our experience, to answer the question "why?" rather than only the question "what?", is one of the foremost objectives of all rational inquiry; andExpand
Predictive Analytics in Information Systems Research
To show that predictive analytics and explanatory statistical modeling are fundamentally disparate, it is shown that they are different in each step of the modeling process and these differences translate into different final models, so that a pure explanatory statistical model is best tuned for testing causal hypotheses and a pure predictive models is best in terms of predictive power. Expand
Statistical modeling: The two cultures
If the goal as a field is to use data to solve problems, then the statistical community needs to move away from exclusive dependence on data models and adopt a more diverse set of tools. Expand