Effect size, confidence interval and statistical significance: a practical guide for biologists

@article{Nakagawa2007EffectSC,
  title={Effect size, confidence interval and statistical significance: a practical guide for biologists},
  author={Shinichi Nakagawa and Innes C. Cuthill},
  journal={Biological Reviews},
  year={2007},
  volume={82}
}
Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its… 
Statistical Significance Versus Clinical Importance of Observed Effect Sizes: What Do P Values and Confidence Intervals Really Represent?
TLDR
This tutorial reviews different effect size measures and describes how confidence intervals can be used to address not only the statistical significance but also the clinical significance of the observed effect or association, and discusses what P values actually represent.
Power rangers: no improvement in the statistical power of analyses published in Animal Behaviour
The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum?
TLDR
A quick-and-easy guide to some simple yet powerful statistical options that augment or replace the p-value, and that are relatively straightforward to apply, to support biologists in adopting new approaches where they feel that thep-value alone is not doing their data justice.
Performing Contrast Analysis in Factorial Designs: From NHST to Confidence Intervals and Beyond
TLDR
This tutorial reviews these methods to guide researchers in answering the following questions: When I analyze mean differences in factorial designs, where can I find the effects of central interest, and what can I learn about their effect sizes.
Computation of measures of effect size for neuroscience data sets
TLDR
An open‐access matlab toolbox provides a wide range of MES to complement the frequently used types of hypothesis tests, such as t‐tests and analysis of variance, and should be useful to neuroscientists wishing to enhance their repertoire of statistical reporting.
Significance, Errors, Power, and Sample Size: The Blocking and Tackling of Statistics
TLDR
In research-related hypothesis testing, the term “statistically significant” is used to describe when an observed difference or association has met a certain threshold, which is denoted as alpha (&agr;) and is typically set at .05.
Key steps to avoiding artistry with significance tests
Statistical significance provides evidence for or against an explanation of a population of interest, not a description of data sampled from the population. This simple distinction gets ignored in
Inference without significance: measuring support for hypotheses rather than rejecting them
TLDR
Inference based on significance testing is compared with model-based, likelihood and Bayesian inference using data on an endangered porpoise, Phocoena sinus, to find alternatives that lead to greater understanding and improved inference.
Invasive Plant Researchers Should Calculate Effect Sizes, Not P-Values
TLDR
Confidence intervals indicate effect sizes, and compared to P-values, confidence intervals provide more complete, intuitively appealing information on what data do/do not indicate, which helps build a case for confidence intervals as preferable alternatives for Null hypothesis significance testing.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 119 REFERENCES
A farewell to Bonferroni: the problems of low statistical power and publication bias
TLDR
The meta-analysis on statistical power by Jennions and Moller (2003) revealed that, in the field of behavioral ecology and animal behavior, statistical power of less than 20% to detect a small effect and power of more than 50% to detects a medium effect existed.
Comparing effect sizes across variables: generalization without the need for Bonferroni correction
TLDR
The calculated effect sizes may be further used in simple analyses that can help to estimate the true effect of a predictor variable and thus make general conclusions, and the omission of nonsignificant results from publications is undesirable.
The case against retrospective statistical power analyses with an introduction to power analysis
Statistical power analysis is an important tool for planning an experiment because this type of analysis allows researchers to identify an appropriate sample size for a particular experimental
Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology.
TLDR
Overall, results of the survey show some improvements in statistical practice, but further efforts are clearly required to move the discipline toward improved practices.
A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions
Reform of statistical practice in the social and behavioral sciences requires wider use of confidence intervals (CIs), effect size measures, and meta-analysis. The authors discuss four reasons for
Effect-Size Estimates: Issues and Problems in Interpretation
In recent years, researchers have recognized the importance of the concept of effect size for planning research, determining the significance of research results, and accumulating results across
A survey of the statistical power of research in behavioral ecology and animal behavior
TLDR
There was a significant correlation between power and reported p value for both first and last tests, suggesting that failure to observe significant relationships is partly owing to small sample sizes, as power increases with sample size.
Correct Confidence Intervals for Various Regression Effect Sizes and Parameters: The Importance of Noncentral Distributions in Computing Intervals
The advantages that confidence intervals have over null-hypothesis significance testing have been presented on many occasions to researchers in psychology. This article provides a practical
Review of assumptions and problems in the appropriate conceptualization of effect size.
TLDR
Estimation of the effect size parameter, D, the standardized difference between population means, is sensitive to heterogeneity of variance (heteroscedasticity), which seems to abound in psychological data, and various proposed solutions are reviewed, including measures that do not make these assumptions.
...
1
2
3
4
5
...