# Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing

@article{Hubbard2008WhyPV, title={Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing}, author={Raymond Hubbard and Rachael Lindsay}, journal={Theory \& Psychology}, year={2008}, volume={18}, pages={69 - 88} }

Reporting p values from statistical significance tests is common in psychology's empirical literature. Sir Ronald Fisher saw the p value as playing a useful role in knowledge development by acting as an `objective' measure of inductive evidence against the null hypothesis. We review several reasons why the p value is an unobjective and inadequate measure of evidence when statistically testing hypotheses. A common theme throughout many of these reasons is that p values exaggerate the evidence…

## Tables from this paper

## 205 Citations

### Hail the impossible: p-values, evidence, and likelihood.

- PsychologyScandinavian journal of psychology
- 2011

Using p in the Fisherian sense as a measure of statistical evidence is deeply problematic, both statistically and conceptually, while the Neyman-Pearson interpretation is not about evidence at all.

### To P or not to P: on the evidential nature of P-values and their place in scientific inference

- Medicine
- 2013

It is shown that P-values quantify experimental evidence not by their numerical value, but through the likelihood functions that they index.

### Statistical Significance and the Dichotomization of Evidence

- Psychology
- 2017

ABSTRACT In light of recent concerns about reproducibility and replicability, the ASA issued a Statement on Statistical Significance and p-values aimed at those who are not primarily statisticians.…

### Abandon Statistical Signi fi cance

- Computer Science
- 2019

This work recommends dropping the NHST paradigm—and the p-value thresholds intrinsic to it—as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences and argues that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures.

### P values are only an index to evidence: 20th- vs. 21st-century statistical science.

- Computer ScienceEcology
- 2014

The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology.

### Bayes factor and posterior probability: Complementary statistical evidence to p-value.

- MathematicsContemporary clinical trials
- 2015

### Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values

- PsychologyThe American Statistician
- 2019

Abstract The present note explores sources of misplaced criticisms of P-values, such as conflicting definitions of “significance levels” and “P-values” in authoritative sources, and the consequent…

### Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence

- PsychologyManag. Sci.
- 2016

Dichotomization of evidence is reduced though still present when researchers are asked to make decisions based on the evidence, particularly when the decision outcome is personally consequential.

### Time to dispense with the p-value in OR?

- EconomicsCentral Eur. J. Oper. Res.
- 2018

P-values are an inadequate choice for a succinct executive summary of statistical evidence for or against a research question, and in statistical summaries confidence intervals of standardized effect sizes provide much more information than p-values without requiring much more space.

## References

SHOWING 1-10 OF 109 REFERENCES

### p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate.

- PsychologyAmerican journal of epidemiology
- 1993

An analysis using another method promoted by Fisher, mathematical likelihood, shows that the p value substantially overstates the evidence against the null hypothesis.

### Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence

- Mathematics
- 1987

Abstract The problem of testing a point null hypothesis (or a “small interval” null hypothesis) is considered. Of interest is the relationship between the P value (or observed significance level) and…

### Confusion Over Measures of Evidence (p's) Versus Errors (α's) in Classical Statistical Testing

- Psychology
- 2003

Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers, most of whom erroneously believe that such tests are…

### P Values are not Error Probabilities

- Psychology
- 2003

Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers. The confusion stems from the fact that most of these…

### If Statistical Significance Tests are Broken/Misused, What Practices Should Supplement or Replace Them?

- Psychology
- 1999

Given some consensus that statistical significance tests are broken, misused or at least have somewhat limited utility, the focus of discussion within the field ought to move beyond additional…

### The appropriate use of null hypothesis testing.

- Psychology
- 1996

The many criticisms of null hypothesis testing suggest when it is not useful and what is should not be used for. This article explores when and why its use is appropriate. Null hypothesis testing is…

### The Historical Growth of Statistical Significance Testing in Psychology--and Its Future Prospects.

- Psychology
- 2000

The historical growth in the popularity of statistical significance testing is examined using a random sample of annual data from 12 American Psychological Association (APA) journals. The results…

### P Values: What They are and What They are Not

- Mathematics
- 1996

Abstract P values (or significance probabilities) have been used in place of hypothesis tests as a means of giving more information about the relationship between the data and the hypothesis than…

### Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers

- Psychology
- 1996

Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use meta-analysis to show…

### Null hypothesis significance testing: a review of an old and continuing controversy.

- PsychologyPsychological methods
- 2000

The concluding opinion is that NHST is easily misunderstood and misused but that when applied with good judgment it can be an effective aid to the interpretation of experimental data.