Remove, rather than redefine, statistical significance

  title={Remove, rather than redefine, statistical significance},
  author={Valentin Amrhein and Sander Greenland},
  journal={Nature Human Behaviour},
To the Editor — Benjamin et al.1 propose to redefine statistical significance with a trichotomy: what was once ‘highly significant’ (P < 0.005) becomes ‘significant’, what was once significant (P < 0.05) becomes ‘suggestive’, and what was ‘nonsignificant’ (P > 0.05) remains nonsignificant. Trichotomization is better than dichotomization, and we agree that P values around 0.05 convey only limited evidence against the tested hypothesis (which is usually a ‘null’ hypothesis of no effect)2. We also… Expand
Why 'Redefining Statistical Significance' Will Not Improve Reproducibility and Could Make the Replication Crisis Worse
A recent proposal to "redefine statistical significance" (Benjamin, et al. Nature Human Behaviour, 2017) claims that false positive rates "would immediately improve" by factors greater than two andExpand
Abandon Statistical Significance
This work recommends dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences and argues that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. Expand
Redefining the Critical Value of Significance Level (0.005 instead of 0.05): The Bayes Trace
The precise sense of some concepts, such as p-value, the Bayes factor, and the minimum a posteriori probability of the zero hypothesis are discussed in this review, made mainly with the examples related to the comparison of frequencies. Expand
The p value wars (again)
  • U. Dirnagl
  • Psychology, Medicine
  • European Journal of Nuclear Medicine and Molecular Imaging
  • 2019
The p value is at the heart of a much wider discussion which started in earnest about a decade ago in Psychology and quickly percolated through the life sciences in general, and where the p value, or rather its interpretation, takes center stage. Expand
Abandon Statistical Signi fi cance
We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remainExpand
Redefining significance and reproducibility for medical research: A plea for higher P‐value thresholds for diagnostic and prognostic models
It is concluded that a lower P‐value threshold for declaring statistical significance implies more exaggeration in an estimated effect, which implies that if a low threshold is used, effect size estimation should not be attempted, for example in the context of selecting promising discoveries that need further validation. Expand
The Impact of P-hacking on “Redefine Statistical Significance”
  • Harry Crane
  • Psychology
  • Basic and Applied Social Psychology
  • 2018
Abstract In their proposal to “redefine statistical significance,” Benjamin et al. claim that lowering the default cutoff for statistical significance from .05 to .005 would “immediately improve theExpand
Manipulating the Alpha Level Cannot Cure Significance Testing
We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of newExpand
Beyond psychology: prevalence of p value and confidence interval misinterpretation across different fields
P values and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys have revealed that these two indices are generally misunderstood.Expand
Three Recommendations for Improving the Use of p-Values
ABSTRACT Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directlyExpand


The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research
The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process, and potential arguments against removing significance thresholds are discussed. Expand
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations ofExpand
Invited Commentary: The Need for Cognitive Science in Methodology
  • S. Greenland
  • Computer Science, Medicine
  • American journal of epidemiology
  • 2017
It is concluded that methodological development and training should go beyond coverage of mechanistic biases to cover distortions of conclusions produced by statistical methods and psychosocial forces. Expand
The Long Way From α-Error Control to Validity Proper
It is argued that, given the current state of affairs in behavioral science, false negatives often constitute a more serious problem and a scientific culture rewarding strong inference is more likely to see progress than a culture preoccupied with tightening its standards for the mere publication of original findings. Expand
The ASA Statement on p-Values: Context, Process, and Purpose
Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as p< 0.05: “We teach it because it’s what we do; we do it because it’s what weExpand
For and Against Methodologies: Some Perspectives on Recent Causal and Statistical Inference Debates
It is argued that, once these misconceptions are removed, most elements of the opposing views can be reconciled and the chief problem of causal inference becomes one of how to teach sound use of formal methods and how to apply them without generating the overconfidence and misinterpretations that have ruined so many statistical practices. Expand
Power failure: why small sample size undermines the reliability of neuroscience
It is shown that the average statistical power of studies in the neurosciences is very low, and the consequences include overestimates of effect size and low reproducibility of results. Expand
Statistical inference : a commentary for the social and behavioural sciences / Michael Oakes
Preface ON SIGNIFICANCE TESTS: The Logic of the Significance Test A Critique of Significance Tests Intuitive Statistical Judgements SCHOOLS OF STATISTICAL INFERENCE: Theories of Probability FurtherExpand
Competing interests The authors declare no competing interests. Nature HumaN BeHaviour | VOL 2 | JANUARY 2018 | 4 |
  • Stat. 70,
  • 2016