Power failure: why small sample size undermines the reliability of neuroscience

@article{Button2013PowerFW,
  title={Power failure: why small sample size undermines the reliability of neuroscience},
  author={Katherine S. Button and John P. A. Ioannidis and Claire Mokrysz and Brian A. Nosek and Jonathan Flint and Emma S. J. Robinson and Marcus Robert Munafo},
  journal={Nature Reviews Neuroscience},
  year={2013},
  volume={14},
  pages={365-376}
}
A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is… 
Is neuroscience facing up to statistical power
TLDR
Concerns regarding statistical power in neuroscience have mostly not yet been addressed, as sample size justifications provided for all 15 papers published in one recent issue of the leading journal Nature Neuroscience are reviewed.
How to Enhance the Power to Detect Brain–Behavior Correlations With Limited Resources
TLDR
Ad hoc simulations show that statistical power crucially depends on the choice of behavioral and neural measures, as well as on sampling strategy, and behavioral prescreening and the selection of extreme groups can ascertain a high degree of robust in-sample variance.
Power to the People: Power, Negative Results and Sample Size.
  • B. Gaskill, J. Garner
  • Psychology
    Journal of the American Association for Laboratory Animal Science : JAALAS
  • 2019
TLDR
What power is, how it can be calculated, and reporting recommendations if a null result is found are reviewed, so that readers and reviewers can determine whether an experiment had sufficient power.
How to Enhance the Power to Detect Brain-Behavior Correlations With Limited Resources.
TLDR
Ad-hoc simulations show that statistical power crucially depends on the choice of behavioral and neural measures, as well as on sampling strategy, and behavioral prescreening and the selection of extreme groups can ascertain a high degree of robust in-sample variance.
A systematic review of sample size and power in leading neuroscience journals
TLDR
It is suggested that reporting checklists may not improve the use and reporting of formal power calculations, and there is little evidence that sample sizes were adequate to achieve this level of statistical power, even for large effect sizes.
Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling
TLDR
It is found that statistical power is extremely low for studies included in meta-analyses that reported a null result and that it varies substantially across subfields of neuroscience, with particularly low power in candidate gene association studies.
How sample size influences the replicability of task-based fMRI
TLDR
The degree of replicability for typical sample sizes is modest and that sample sizes much larger than typical produce results that fall well short of perfectly replicable, which joins the existing line of work advocating for larger sample sizes.
Small sample sizes reduce the replicability of task-based fMRI studies
TLDR
An assessment of replicability of task-based fMRI studies as a function of sample size is found and standards requiring larger sample sizes are advocated, potentially in excess of N = 100.
Effect size and statistical power in the rodent fear conditioning literature – A systematic review
TLDR
Effect sizes and statistical power have a wide distribution in the rodent fear conditioning literature, but do not seem to have a large influence on how results are described or cited.
Hypothesis-Testing Improves the Predicted Reliability of Neuroscience Research
TLDR
A sample of neuroscience publications is reviewed to estimate the prevalence and extensiveness of hypothesis-testing research and a method for combining test results is applied to show that the practice of testing multiple predictions of hypotheses increases the predicted reliability of neuroscience research.
...
...

References

SHOWING 1-10 OF 132 REFERENCES
A peculiar prevalence of p values just below .05
In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the
False-Positive Psychology
TLDR
It is shown that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings, flexibility in data collection, analysis, and reporting dramatically increases actual false- positive rates, and a simple, low-cost, and straightforwardly effective disclosure-based solution is suggested.
Why Most Published Research Findings Are False
TLDR
Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.
An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science
  • Brian A. Nosek, D. Lakens
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2012
TLDR
The Reproducibility Project is an open, large-scale, collaborative effort to systematically examine the rate and predictors of reproducibility in psychological science.
An exploratory test for an excess of significant findings
TLDR
A test to explore biases stemming from the pursuit of nominal statistical significance was developed and demonstrated a clear or possible excess of significant studies in 6 of 8 large meta-analyses and in the wide domain of neuroleptic treatments.
Why Science Is Not Necessarily Self-Correcting
  • J. Ioannidis
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2012
TLDR
A number of impediments to self-correction that have been empirically studied in psychological science are cataloged and some proposed solutions to promote sound replication practices enhancing the credibility of scientific results are discussed.
Data from Paper “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be
...
...