Alphabet Soup

  title={Alphabet Soup},
  author={Raymond Hubbard},
  journal={Theory \& Psychology},
  pages={295 - 327}
  • R. Hubbard
  • Published 1 June 2004
  • Psychology
  • Theory & Psychology
Confusion over the reporting and interpretation of results of commonly employed classical statistical tests is recorded in a sample of 1,645 papers from 12 psychology journals for the period 1990 through 2002. The confusion arises because researchers mistakenly believe that their interpretation is guided by a single unified theory of statistical inference. But this is not so: classical statistical testing is a nameless amalgamation of the rival and often contradictory approaches developed by… 

Tables from this paper

Tests of Statistical Significance Made Sound

  • B. Haig
  • Psychology
    Educational and psychological measurement
  • 2017
It is suggested that to correct for the deficiencies of the hybrid, psychology avail itself of two important and more recent viewpoints on ToSS, namely the neo-Fisherian and the error-statistical perspectives.

The statistical theories of Fisher and of Neyman and Pearson: A methodological perspective

Most of the debates around statistical testing suffer from a failure to identify clearly the features specific to the theories invented by Fisher and by Neyman and Pearson. These features are

Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations

The theoretical origins of NHST are introduced to the scientometric community, which is mostly absent from standard statistical textbooks, and some of the most prevalent problems relating to the practice are discussed and traced back to the mix-up of the two different theoretical origins.

Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence

Dichotomization of evidence is reduced though still present when researchers are asked to make decisions based on the evidence, particularly when the decision outcome is personally consequential.

Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing

Reporting p values from statistical significance tests is common in psychology's empirical literature. Sir Ronald Fisher saw the p value as playing a useful role in knowledge development by acting as

Détente: A Practical Understanding of P values and Bayesian Posterior Probabilities

  • S. Ruberg
  • Psychology
    Clinical pharmacology and therapeutics
  • 2020
The fundamental differences in NHST and Bayesian approaches are explained and how they can co‐exist harmoniously to guide clinical trial design and inference is demonstrated.

“Repeated sampling from the same population?” A critique of Neyman and Pearson’s responses to Fisher

  • Mark Rubin
  • Psychology
    European Journal for Philosophy of Science
  • 2020
Fisher ( 1945a , 1945b , 1955 , 1956 , 1960 ) criticised the Neyman-Pearson approach to hypothesis testing by arguing that it relies on the assumption of “repeated sampling from the same population.”

Design sensitivity and statistical power in acceptability judgment experiments

The goals of the current study are to provide a fuller picture of the status of acceptability judgment data in syntax, and to provide detailed information that syntacticians can use to design and evaluate the sensitivity of acceptable judgment experiments in their own research.

Significance tests as sorcery: Science is empirical—significance tests are not

Since the 1930s, many of our top methodologists have argued that significance tests are not conducive to science. Bakan (1966) believed that “everyone knows this” and that we slavishly lean on the

Before p < 0.05 to Beyond p < 0.05: Using History to Contextualize p-Values and Significance Testing

History enables students, practitioners, and statisticians to treat the discipline as an ongoing endeavor, crafted by fallible humans, and provides a deeper understanding of the subject and its consequences for science and society.



Evidence, Inference, and the “Rejection” of the Significance Test

Hammond (1996) reiterates Cohen's (1994) “attack” on simple-minded interpretations of significance tests and recommends the use of other statistical methods (including effect size measures and

The fallacy of the null-hypothesis significance test.

To the experimental scientist, statistical inference is a research instrument, a processing device by which unwieldy masses of raw data may be refined into a product more suitable for assimilation into the corpus of science, and in this lies both strength and weakness.

Scientific versus Statistical Inference

Abstract We argue that the goals of scientists in data analysis and scientific communication do not match the logic of hypothesis testing as it is typically taught in introductory statistics courses.


In an earlier paper* we have endeavoured to emphasise the importance of placing in a logical sequence the stages of reasoning adopted in the solution of certain statistical problems, which may be

The Spread of Statistical Significance Testing in Psychology

Because the widespread use of statistical significance testing has deleterious consequences for the development of a cumulative knowledge base, the American Psychological Association's Board of

p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate.

  • S. Goodman
  • Psychology
    American journal of epidemiology
  • 1993
An analysis using another method promoted by Fisher, mathematical likelihood, shows that the p value substantially overstates the evidence against the null hypothesis.


To begin with, Neyman and Pearson agreed with Fisher that the result in a hypothesis test is a measure of evidence. In their first joint paper, which was published in 1928, they declared that the

Frequentist probability and frequentist statistics

The stimulus is multiple: letters from friends calling my attention to a dispute in journal articles, in letters to editors, and in books, about what is described as 'the Neyman-Pearson school' and particularly what isdescribed as Neyman's 'radical' objectivism.

Colloquium on Effect Sizes: the Roles of Editors, Textbook Authors, and the Publication Manual

Reformers have long argued that misuse of Null Hypothesis Significance Testing (NHST) is widespread and damaging. The authors analyzed 150 articles from the Journal of Applied Psychology (JAP)

Rejoinder: Editorial Policies Regarding Statistical Significance Tests: Further Comments

In this response to Robinson and Levin’s comments on Thompson (1996), it is argued that describing results as “significant” rather than “statistically significant” is confusing to those persons most