Statistical Evidence in Experimental Psychology

@article{Wetzels2011StatisticalEI,
  title={Statistical Evidence in Experimental Psychology},
  author={Ruud Wetzels and D{\'o}ra Matzke and Michael David Lee and Jeffrey N. Rouder and Geoffrey J. Iverson and Eric-Jan Wagenmakers},
  journal={Perspectives on Psychological Science},
  year={2011},
  volume={6},
  pages={291 - 298}
}
Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values… 

Figures and Tables from this paper

Estimating the evidential value of significant results in psychological science
TLDR
It is concluded that due to the threshold of acceptance having been set too low for psychological findings, a substantial proportion of the published results have weak evidential support.
Bayesian inference for psychology. Part II: Example applications with JASP
TLDR
This part of this series introduces JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems.
The Heuristic Value of p in Inductive Statistical Inference
TLDR
It is concluded that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst.
Beyond p values: utilizing multiple methods to evaluate evidence
Null hypothesis significance testing is cited as a threat to validity and reproducibility. While many individuals suggest that we focus on altering the p value at which we deem an effect significant,
Four reasons to prefer Bayesian analyses over significance testing
TLDR
It is argued that appropriate conclusions match the Bayesian inferences, but not those based on significance testing, where they disagree; it is shown that a high-powered non-significant result is consistent with no evidence for H0 over H1 worth mentioning, which a Bayes factor can show.
A Systematic Review of Bayesian Articles in Psychology: The Last 25 Years
TLDR
It is found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions.
Introduction to Special Section on Bayesian Data Analysis
  • J. Kruschke
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2011
TLDR
Bayesian data analysis offers an alternative approach that solves the problems of NHST and also provides richer, more informative inferences and more flexible application, as well as addressing questions about null values.
A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use
TLDR
Applying a 10,000-case simulation of null hypothesis significance testing, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population.
Using Bayes Factors to Test Hypotheses in Developmental Research
TLDR
The concept of Bayes factors as inferential tools that can serve as an alternative to null hypothesis significance testing in the day-to-day work of developmental researchers are discussed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 88 REFERENCES
The p-value fallacy and how to avoid it.
  • P. Dixon
  • Psychology
    Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale
  • 2003
TLDR
It is argued that the method to identify competing interpretations of the data and then use likelihood ratios to assess which interpretation provides the better account satisfies a principle of "graded evidence," according to which similar data should provide similar evidence.
A practical solution to the pervasive problems ofp values
TLDR
The BIC provides an approximation to a Bayesian hypothesis test, does not require the specification of priors, and can be easily calculated from SPSS output.
Bayesian data analysis.
  • J. Kruschke
  • Political Science
    Wiley interdisciplinary reviews. Cognitive science
  • 2010
TLDR
A fatal flaw of NHST is reviewed and some benefits of Bayesian data analysis are introduced and illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power are presented.
Bayesian statistical inference in psychology: comment on Trafimow (2003).
TLDR
This comment, with the help of a simple example, explains the usefulness of Bayesian inference for psychology.
Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better
  • G. Cumming
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2008
TLDR
P is so unreliable and gives such dramatically vague information that it is a poor basis for inference that researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.
Bayesian Versus Orthodox Statistics: Which Side Are You On?
  • Z. Dienes
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2011
TLDR
This article presents some common situations in which Bayesian and orthodox approaches to significance testing come to different conclusions; the reader is shown how to apply Bayesian inference in practice, using free online software, to allow more coherent inferences from data.
Beyond statistical inference: a decision theory for science.
  • P. Killeen
  • Economics
    Psychonomic bulletin & review
  • 2006
TLDR
The decision theory proposed here calculates the expected utility of an effect on the basis of the probability of replicating it and a utility function on its size, consistent with alternate measures of effect size, such as r2 and information transmission, and with Bayesian model selection criteria.
Statistical Methods in Psychology Journals: Guidelines and Explanations
In the light of continuing debate over the applications of significance testing in psychology journals and following the publication of Cohen's (1994) article, the Board of Scientific Affairs (BSA)
Beyond statistical inference: A decision theory for science
Traditional null hypothesis significance testing does not yield the probability of the null or its alternative and, therefore, cannot logically ground scientific decisions. The decision theory
...
1
2
3
4
5
...