Remove, rather than redefine, statistical significance

@article{Amrhein2017RemoveRT,
  title={Remove, rather than redefine, statistical significance},
  author={Valentin Amrhein and Sander Greenland},
  journal={Nature Human Behaviour},
  year={2017},
  volume={2},
  pages={4}
}
To the Editor — Benjamin et al.1 propose to redefine statistical significance with a trichotomy: what was once ‘highly significant’ (P < 0.005) becomes ‘significant’, what was once significant (P < 0.05) becomes ‘suggestive’, and what was ‘nonsignificant’ (P > 0.05) remains nonsignificant. Trichotomization is better than dichotomization, and we agree that P values around 0.05 convey only limited evidence against the tested hypothesis (which is usually a ‘null’ hypothesis of no effect)2. We also… 

Why 'Redefining Statistical Significance' Will Not Improve Reproducibility and Could Make the Replication Crisis Worse

A recent proposal to "redefine statistical significance" (Benjamin, et al. Nature Human Behaviour, 2017) claims that false positive rates "would immediately improve" by factors greater than two and

Abandon Statistical Significance

This work recommends dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences and argues that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures.

The p value wars (again)

  • U. Dirnagl
  • Psychology
    European Journal of Nuclear Medicine and Molecular Imaging
  • 2019
The p value is at the heart of a much wider discussion which started in earnest about a decade ago in Psychology and quickly percolated through the life sciences in general, and where the p value, or rather its interpretation, takes center stage.

Abandon Statistical Signi fi cance

This work recommends dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences and argues that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures.

The Impact of P-hacking on “Redefine Statistical Significance”

  • Harry Crane
  • Economics
    Basic and Applied Social Psychology
  • 2018
Abstract In their proposal to “redefine statistical significance,” Benjamin et al. claim that lowering the default cutoff for statistical significance from .05 to .005 would “immediately improve the

Manipulating the Alpha Level Cannot Cure Significance Testing

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new

Beyond psychology: prevalence of p value and confidence interval misinterpretation across different fields

P values and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys have revealed that these two indices are generally misunderstood.

Why and how we should join the shift from significance testing to estimation

It is concluded that studies in ecology and evolutionary biology are mostly exploratory and descriptive, and should shift from claiming to ‘test’ specific hypotheses statistically to describing and discussing many hypotheses (possible true effect sizes) that are most compatible with the authors' data, given their statistical model.

Three Recommendations for Improving the Use of p-Values

ABSTRACT Researchers commonly use p-values to answer the question: How strongly does the evidence favor the alternative hypothesis relative to the null hypothesis? p-Values themselves do not directly

P values in display items are ubiquitous and almost invariably significant: A survey of top science journals

Substantial and growing reliance on P values in display items is demonstrated, with increases of 2.5 to 14.5 times in 2017 compared to 1997, and wider appreciation of the need for multiplicity corrections is a welcome evolution.
...

References

SHOWING 1-10 OF 10 REFERENCES

The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process, and potential arguments against removing significance thresholds are discussed.

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of

Invited Commentary: The Need for Cognitive Science in Methodology

It is concluded that methodological development and training should go beyond coverage of mechanistic biases to cover distortions of conclusions produced by statistical methods and psychosocial forces.

The Long Way From α-Error Control to Validity Proper

It is argued that, given the current state of affairs in behavioral science, false negatives often constitute a more serious problem and a scientific culture rewarding strong inference is more likely to see progress than a culture preoccupied with tightening its standards for the mere publication of original findings.

The ASA Statement on p-Values: Context, Process, and Purpose

Cobb’s concern was a long-worrisome circularity in the sociology of science based on the use of bright lines such as p< 0.05: “We teach it because it’s what we do; we do it because it’s what we

For and Against Methodologies: Some Perspectives on Recent Causal and Statistical Inference Debates

It is argued that, once these misconceptions are removed, most elements of the opposing views can be reconciled and the chief problem of causal inference becomes one of how to teach sound use of formal methods and how to apply them without generating the overconfidence and misinterpretations that have ruined so many statistical practices.

Power failure: why small sample size undermines the reliability of neuroscience

It is shown that the average statistical power of studies in the neurosciences is very low, and the consequences include overestimates of effect size and low reproducibility of results.

Statistical inference : a commentary for the social and behavioural sciences / Michael Oakes

Preface ON SIGNIFICANCE TESTS: The Logic of the Significance Test A Critique of Significance Tests Intuitive Statistical Judgements SCHOOLS OF STATISTICAL INFERENCE: Theories of Probability Further

Competing interests The authors declare no competing interests. Nature HumaN BeHaviour | VOL 2 | JANUARY 2018 | 4 | www.nature.com/nathumbehav

  • Stat. 70,
  • 2016