P-Curve: A Key to the File Drawer

@article{Simonsohn2014PCurveAK,
  title={P-Curve: A Key to the File Drawer},
  author={Uri Simonsohn and Leif D. Nelson and Joseph P. Simmons},
  journal={Cognitive Linguistics: Cognition},
  year={2014}
}
Because scientists tend to report only studies (publication bias) or analyses (p-hacking) that "work," readers must ask, "Are these effects true, or do they merely reflect selective reporting?" We introduce p-curve as a way to answer this question. P-curve is the distribution of statistically significant p values for a set of studies (ps < .05). Because only true effects are expected to generate right-skewed p-curves-containing more low (.01s) than high (.04s) significant p values--only right… 
Detecting Evidential Value and P-Hacking With the P-curve tool: A Word of Caution
TLDR
It is shown that not only selective reporting but also selective nonreporting of significant results due to a significant outcome of a more popular alternative test of the same hypothesis may produce left-skewed p-curves, even if all studies reflect true effects.
P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without
Detecting Evidential Value and p-Hacking With the p-Curve Tool
TLDR
It is shown that not only selective reporting but also selective nonreporting of significant results due to a significant outcome of a more popular alternative test of the same hypothesis may produce left-skewed p-curves, even if all studies reflect true effects.
p-Curve and p-Hacking in Observational Research
TLDR
The p-curve for observational research in the presence of p-hacking is analyzed and it is shown that even with minimal omitted-variable bias (e.g., unaccounted confounding) p- Curve based on true effects and p-Curves based on null-effects with p-Hacking cannot be reliably distinguished.
p-Curve and Effect Size
Journals tend to publish only statistically significant evidence, creating a scientific record that markedly overstates the size of effects. We provide a new tool that corrects for this bias without
Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value
TLDR
It is concluded that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis.
Some properties of p-curves, with an application to gradual publication bias.
TLDR
The results of 2 survey experiments support the existence of a cliff effect at p = .05 and suggest that researchers tend to be more likely to recommend submission of an article as the level of statistical significance increases beyond this p level.
The Extent and Consequences of P-Hacking in Science
TLDR
It is suggested that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses, and its effect seems to be weak relative to the real effect sizes being measured.
Z-Curve.2.0: Estimating Replication Rates and Discovery Rates
Publication bias, the fact that published studies are not necessarily representative of all conducted studies, poses a significant threat to the credibility of scientific literature. To mitigate the
Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a Reply to Ulrich and Miller (2015).
TLDR
This work considers the possibility that researchers report only the smallest significant p value, the impact of more common problems, including p-curvers selecting the wrong p values, fake data, honest errors, and ambitiously p-hacked results, and provides practical solutions that substantially increase its robustness.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 73 REFERENCES
The file drawer problem and tolerance for null results
TLDR
Quantitative procedures for computing the tolerance for filed and future null results are reported and illustrated, and the implications are discussed.
Inappropriate Fiddling with Statistical Analyses to Obtain a Desirable P-value: Tests to Detect its Presence in Published Literature
TLDR
This article presents a method for detecting the presence of manipulation of statistical analyses to push a “near significant p-value” to a level that is considered significant in a distribution of p-values from independent studies.
Publication bias in situ
  • C. Phillips
  • Medicine
    BMC medical research methodology
  • 2004
TLDR
Examples are presented that show how easily PBIS can have a large impact on reported results, as well as how there can be no simple answer to it.
Publication decisions revisited: the effect of the outcome of statistical tests on the decision to p
TLDR
Evidence that published results of scientific investigations are not a representative sample of results of all scientific studies is presented and practice leading to publication bias have not changed over a period of 30 years is indicated.
A fail-safe N for effect size in meta-analysis.
Rosenthan's (1979) concept of fail-safe N has thus far been applied to probability levels exclusively. This note introduces a fail-safe TV for effect size. Rosenthal's (1979) fail-safe N was an
Tests of Significance for 2 × 2 Contingency Tables
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and
A peculiar prevalence of p values just below .05
In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the
Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better
  • G. Cumming
  • Psychology
    Perspectives on psychological science : a journal of the Association for Psychological Science
  • 2008
TLDR
P is so unreliable and gives such dramatically vague information that it is a poor basis for inference that researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.
A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions
Reform of statistical practice in the social and behavioral sciences requires wider use of confidence intervals (CIs), effect size measures, and meta-analysis. The authors discuss four reasons for
Why Most Published Research Findings Are False
TLDR
Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.
...
1
2
3
4
5
...