Justify your alpha

  title={Justify your alpha},
  author={Daniel Lakens and Federico G. Adolfi and Casper J. Albers and Farid Anvari and Matthew A. J. Apps and Shlomo Engelson Argamon and Thom S Baguley and Raymond Becker and Stephen D. Benning and Daniel Bradford and Erin Michelle Buchanan and Aaron R. Caldwell and Ben Van Calster and Rickard Carlsson and Sau-chin Chen and Bryan Chung and Lincoln J. Colling and Gary S. Collins and Zander Crook and Emily S. Cross and Sameera Daniels and Henrik Danielsson and Lisa Marie DeBruine and Daniel J. Dunleavy and Brian D. Earp and Michele I. Feist and Jason Ferrell and James G. Field and Nicholas William Fox and Amanda Friesen and Caio Peres Gomes and Monica Gonzalez-Marquez and James A. Grange and Andrew P. Grieve and Robert Guggenberger and James T. Grist and Anne-Laura Harmelen and Fred Hasselman and Kevin D. Hochard and Mark Romeo Hoffarth and Nicholas Paul Holmes and Michael Ingre and Peder Mortvedt Isager and Hanna Isotalus and Christer Johansson and Konrad Juszczyk and David Anthony Kenny and Ahmed A. Khalil and Barbara Konat and Junpeng Lao and Erik Gahner Larsen and Gerine M. A. Lodder and Jiř{\'i} Lukavsk{\'y} and Christopher R. Madan and David Manheim and Stephen R Martin and Andrea E. Martin and Deborah G. Mayo and Randy J McCarthy and Kevin McConway and Colin McFarland and Amanda Q. X. Nio and Gustav Nilsonne and Cilene L Oliveira and Jean-Jacques Orban de Xivry and Samantha Parsons and Gerit Pfuhl and Kimberly A. Quinn and John J. Sakon and Selahattin Adil Saribay and Iris K. Schneider and Manojkumar Selvaraju and Zsuzsika Sjoerds and Samuel G. Smith and Tim Smits and Jeffrey R. Spies and Vishnu Sreekumar and Crystal Nicole Steltenpohl and Neil Stenhouse and Wojciech Świątkowski and Miguel A. Vadillo and Marcel A.L.M. van Assen and Matt N Williams and Samantha E. Williams and Donald R. Williams and Tal Yarkoni and Ignazio Ziano and Rolf A. Zwaan},
  journal={Nature Human Behaviour},
In response to recommendations to redefine statistical significance to P ≤ 0.005, we propose that researchers should transparently report and justify all choices they make when designing a study, including the alpha level. 

Figures from this paper

Manipulating the Alpha Level Cannot Cure Significance Testing
We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new
How redefining statistical significance can worsen the replication crisis
Abstract In response to the replication crisis in science, a group of prominent scholars has proposed redefining statistical significance by reducing the p -value significance threshold from 0.05 to
Redefine or justify? Comments on the alpha debate.
Benjamin et al. (Nature Human Behaviour 2, 6-10, 2017) proposed improving the reproducibility of findings in psychological research by lowering the alpha level of our conventional null hypothesis
The Alpha War
  • E. Machery
  • Psychology
    Review of Philosophy and Psychology
  • 2019
Benjamin et al. Nature Human Behavior 2 (1), 6–10 ( 2018 ) proposed decreasing the significance level by an order of magnitude to improve the replicability of psychology. This modest, practical
To P or not to P? The Usefulness of P‐values in Quantitative Political Science Research
This contribution gives a short overview over the mechanics of significance testing in inferential statistics, in particular linear models, and tries to put the discussion about the usefulness of
Redefine or justify? Comments on the alpha debate
Given that it is highly unlikely that the field will abandon the NHST paradigm any time soon, lowering the alpha level to .005 is the best way to combat the replication crisis in psychology.
Beyond p values: utilizing multiple methods to evaluate evidence
Null hypothesis significance testing is cited as a threat to validity and reproducibility. While many individuals suggest that we focus on altering the p value at which we deem an effect significant,
Null hypothesis significance testing and effect sizes: can we 'effect' everything … or … anything?
  • D. Lovell
  • Medicine, Computer Science
    Current opinion in pharmacology
  • 2020
The Null Hypothesis Significance Testing (NHST) paradigm is increasingly criticized and new methods, especially Bayesian approaches, are being developed; however, no single method provides a simple answer.
A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use
ABSTRACT When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a
Redefining significance and reproducibility for medical research: A plea for higher P‐value thresholds for diagnostic and prognostic models
It is concluded that a lower P‐value threshold for declaring statistical significance implies more exaggeration in an estimated effect, which implies that if a low threshold is used, effect size estimation should not be attempted, for example in the context of selecting promising discoveries that need further validation.


Redefine statistical significance
The default P-value threshold for statistical significance is proposed to be changed from 0.05 to 0.005 for claims of new discoveries in order to reduce uncertainty in the number of discoveries.
How to test hypotheses if you must.
  • A. Grieve
  • Medicine, Mathematics
    Pharmaceutical statistics
  • 2015
This paper investigates the implications of this for testing in drug development and demonstrates that its adoption leads directly to the likelihood principle and Bayesian approaches.
Statistical Inference as Severe Testing
This book pulls back the cover on disagreements between experts charged with restoring integrity to science, and denies two pervasive views of the role of probability in inference: to assign degrees of belief, and to control error rates in a long run.
On the Reproducibility of Psychological Science
The results of this reanalysis provide a compelling argument for both increasing the threshold required for declaring scientific discoveries and for adopting statistical summaries of evidence that account for the high proportion of tested hypotheses that are false.
Estimating the reproducibility of psychological science
A large-scale assessment suggests that experimental reproducibility in psychology leaves a lot to be desired, and correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Pharmaceutical Statistics
suggestions to aid in the process of adjudication in light of Daubertand its progeny. The final paper in the book, “Judging ‘Good Science’: Toward Cooperation Between Scientists and Lawyers,” was
Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests
This work cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α, which results in stronger scientific inferences.
Discovering the Significance of 5 sigma
We discuss the traditional criterion for discovery in Particle Physics of requiring a significance corresponding to at least 5 sigma; and whether a more nuanced approach might be better.
Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies
This commentary shows how to assess the FPRP and how to use it to decide whether a finding is deserving of attention or "noteworthy" and shows how this approach can lead to improvements in the design, analysis, and interpretation of molecular epidemiology studies.
Registered Reports: Realigning incentives in scientific publishing
This work has shown clear trends in the development of positive emotions in patients with Alzheimer's disease and these trends are likely to be sustained over the course of their lives.