P-Curve: A Key to the File Drawer

@article{Simonsohn2013PCurveAK,
  title={P-Curve: A Key to the File Drawer},
  author={Uri Simonsohn and Leif D. Nelson and Joseph P. Simmons},
  journal={Cognitive Linguistics: Cognition},
  year={2013}
}
Because scientists tend to report only studies (publication bias) or analyses (p-hacking) that "work," readers must ask, "Are these effects true, or do they merely reflect selective reporting?" We introduce p-curve as a way to answer this question. P-curve is the distribution of statistically significant p values for a set of studies (ps < .05). Because only true effects are expected to generate right-skewed p-curves-containing more low (.01s) than high (.04s) significant p values--only right… 

Detecting Evidential Value and P-Hacking With the P-curve tool: A Word of Caution

It is shown that not only selective reporting but also selective nonreporting of significant results due to a significant outcome of a more popular alternative test of the same hypothesis may produce left-skewed p-curves, even if all studies reflect true effects.

Detecting Evidential Value and p-Hacking With the p-Curve Tool

It is shown that not only selective reporting but also selective nonreporting of significant results due to a significant outcome of a more popular alternative test of the same hypothesis may produce left-skewed p-curves, even if all studies reflect true effects.

P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results

Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without

p-Curve and p-Hacking in Observational Research

The p-curve for observational research in the presence of p-hacking is analyzed and it is shown that even with minimal omitted-variable bias (e.g., unaccounted confounding) p- Curve based on true effects and p-Curves based on null-effects with p-Hacking cannot be reliably distinguished.

p-Curve and Effect Size

Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without

Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value

It is concluded that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis.

Some properties of p-curves, with an application to gradual publication bias.

The results of 2 survey experiments support the existence of a cliff effect at p = .05 and suggest that researchers tend to be more likely to recommend submission of an article as the level of statistical significance increases beyond this p level.

Problems in using text-mining and p-curve analysis to detect rate of p-hacking

Use of ghost variables, a form of p-hacking where the experimenter tests many with the effect on the p-curve is random rather than systematic, is simulated and suggests that the potential of systematic bias is mined data is substantial and invalidates conclusions about p-Hacking based on p-values obtained by text-mining.

The Extent and Consequences of P-Hacking in Science

It is suggested that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses, and its effect seems to be weak relative to the real effect sizes being measured.

Don't Let the Truth Get in the Way of a Good Story: An Illustration of Citation Bias in Epidemiologic Research

Research on job strain and the risk of coronary heart disease is used to examine factors that influence citations in peer-reviewed literature and takes into account the impact factor of the publishing journal, which is an indicator of its prestige.
...

References

SHOWING 1-10 OF 60 REFERENCES

The file drawer problem and tolerance for null results

Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed.

Inappropriate Fiddling with Statistical Analyses to Obtain a Desirable P-value: Tests to Detect its Presence in Published Literature

This article presents a method for detecting the presence of manipulation of statistical analyses to push a “near significant p-value” to a level that is considered significant in a distribution of p-values from independent studies.

Publication decisions revisited: the effect of the outcome of statistical tests on the decision to p

Evidence that published results of scientific investigations are not a representative sample of results of all scientific studies is presented and practice leading to publication bias have not changed over a period of 30 years is indicated.

A fail-safe N for effect size in meta-analysis.

Rosenthan's (1979) concept of fail-safe N has thus far been applied to probability levels exclusively. This note introduces a fail-safe TV for effect size. Rosenthal's (1979) fail-safe N was an

A peculiar prevalence of p values just below .05

In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the

Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better

  • G. Cumming
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2008
P is so unreliable and gives such dramatically vague information that it is a poor basis for inference that researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.

Why Most Published Research Findings Are False

Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.

Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance—or Vice Versa

Abstract There is some evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published. Such research being unknown to

Is the Replicability Crisis Overblown? Three Arguments Examined

  • H. PashlerC. Harris
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2012
It is argued that there are no plausible concrete scenarios to back up such forecasts and that what is needed is not patience, but rather systematic reforms in scientific practice.

Misleading funnel plot for detection of bias in meta-analysis.

...