Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing

@article{Hubbard2008WhyPV,
  title={Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing},
  author={Raymond Hubbard and Rachael Lindsay},
  journal={Theory \& Psychology},
  year={2008},
  volume={18},
  pages={69 - 88}
}
Reporting p values from statistical significance tests is common in psychology's empirical literature. Sir Ronald Fisher saw the p value as playing a useful role in knowledge development by acting as an `objective' measure of inductive evidence against the null hypothesis. We review several reasons why the p value is an unobjective and inadequate measure of evidence when statistically testing hypotheses. A common theme throughout many of these reasons is that p values exaggerate the evidence… 

Tables from this paper

Hail the impossible: p-values, evidence, and likelihood.

  • T. Johansson
  • Psychology
    Scandinavian journal of psychology
  • 2011
Using p in the Fisherian sense as a measure of statistical evidence is deeply problematic, both statistically and conceptually, while the Neyman-Pearson interpretation is not about evidence at all.

What is the value of a p value?

To P or not to P: on the evidential nature of P-values and their place in scientific inference

It is shown that P-values quantify experimental evidence not by their numerical value, but through the likelihood functions that they index.

Statistical Significance and the Dichotomization of Evidence

ABSTRACT In light of recent concerns about reproducibility and replicability, the ASA issued a Statement on Statistical Significance and p-values aimed at those who are not primarily statisticians.

Abandon Statistical Signi fi cance

This work recommends dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences and argues that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures.

P values are only an index to evidence: 20th- vs. 21st-century statistical science.

The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology.

Bayes factor and posterior probability: Complementary statistical evidence to p-value.

Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values

Abstract The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent

Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence

Dichotomization of evidence is reduced though still present when researchers are asked to make decisions based on the evidence, particularly when the decision outcome is personally consequential.

Time to dispense with the p-value in OR?

P-values are an inadequate choice for a succinct executive summary of statistical evidence for or against a research question, and in statistical summaries confidence intervals of standardized effect sizes provide much more information than p-values without requiring much more space.
...

References

SHOWING 1-10 OF 109 REFERENCES

p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate.

  • S. Goodman
  • Psychology
    American journal of epidemiology
  • 1993
An analysis using another method promoted by Fisher, mathematical likelihood, shows that the p value substantially overstates the evidence against the null hypothesis.

Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence

Abstract The problem of testing a point null hypothesis (or a “small interval” null hypothesis) is considered. Of interest is the relationship between the P value (or observed significance level) and

Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing

Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers, most of whom erroneously believe that such tests are

P Values are not Error Probabilities

Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers. The confusion stems from the fact that most of these

If Statistical Significance Tests are Broken/Misused, What Practices Should Supplement or Replace Them?

Given some consensus that statistical significance tests are broken, misused or at least have somewhat limited utility, the focus of discussion within the field ought to move beyond additional

The appropriate use of null hypothesis testing.

The many criticisms of null hypothesis testing suggest when it is not useful and what is should not be used for. This article explores when and why its use is appropriate. Null hypothesis testing is

The Historical Growth of Statistical Significance Testing in Psychology--and Its Future Prospects.

The historical growth in the popularity of statistical significance testing is examined using a random sample of annual data from 12 American Psychological Association (APA) journals. The results

P Values: What They are and What They are Not

Abstract P values (or significance probabilities) have been used in place of hypothesis tests as a means of giving more information about the relationship between the data and the hypothesis than

Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers

Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use meta-analysis to show

Null hypothesis significance testing: a review of an old and continuing controversy.

The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data.
...