Statistical power and significance testing in large-scale genetic studies

@article{Sham2014StatisticalPA,
  title={Statistical power and significance testing in large-scale genetic studies},
  author={Pak Chung Sham and Shaun M Purcell},
  journal={Nature Reviews Genetics},
  year={2014},
  volume={15},
  pages={335-346}
}
Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics… 
POWERFUL TEST BASED ON CONDITIONAL EFFECTS FOR GENOME-WIDE SCREENING.
TLDR
This paper considers testing procedures for screening large genome-wide data and proposes a new test that is based on conditional effects from multiple SNPs, which is shown to be more powerful than the minimum p-value method and clearly outperforms the other methods in the literature.
Power and Sample Size Calculations for Genetic Association Studies in the Presence of Genetic Model Misspecification
TLDR
Understanding the impact of model misspecification can aid in study design and developing analysis plans that maximize power to detect a range of true underlying genetic effects, and these calculations help identify when a multiple degree of freedom test or other robust test of association may be advantageous.
A simple and accurate method to determine genomewide significance for association tests in sequencing studies
  • D. Lin
  • Biology
    Genetic epidemiology
  • 2019
TLDR
A simple and accurate method based on parametric bootstrap to assess genomewide significance and it is shown that the correlations of the test statistics are determined primarily by the genotypes, such that the same significance threshold can be used in different studies that share a common sequencing platform.
The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants
Genome-wide association studies (GWAS) have long relied on proposed statistical significance thresholds to be able to differentiate true positives from false positives. Although the genome-wide
Quick approximation of threshold values for genome-wide association studies
TLDR
High similarities in the critical thresholds between the accurate and approximate estimations were demonstrated by extensive simulations and real data analysis, and hypothesis testing when a nuisance parameter is present only under the alternative was introduced to quickly approximate thecritical thresholds of these test statistics for GWASs.
Multiple Hypothesis Testing in a Genome Wide Association Study of Bovine Tuberculosis
TLDR
The results of this study showed that multiple hypothesis testing procedures are related to false positive genomic signals.
GWAS Significance Thresholds for Deep Phenotyping Studies Can Depend Upon Minor Allele Frequencies and Sample Size
TLDR
The threshold to find a particular level of family-wise significance may need to be established using separate permutations of the actual data for several MAF bins, and it is proposed that the permutation threshold is influenced by minor allele frequency of the SNPs, and by the number of individuals tested.
Opportunities and challenges for the use of common controls in sequencing studies.
TLDR
Challenges and opportunities for the robust use of common controls in high-throughput sequencing studies are discussed, including study design, quality control and statistical approaches.
Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests
TLDR
A new methodology is introduced where it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.
Covariate Selection for Association Screening in Multi-Phenotype Genetic studies
TLDR
This work developed covariates for multiphenotype studies (CMS), an approach that improves power when correlated phenotypes are measured on the same samples, and analyses of real and simulated data provide direct evidence that correlated phenotype can be used to achieve increases in power to levels often surpassing the power gained by a twofold increase in sample size.
...
...

References

SHOWING 1-10 OF 103 REFERENCES
Genome‐wide significance for dense SNP and resequencing data
TLDR
This work approximate genome‐wide significance thresholds in contemporary West African, East Asian and European populations by simulating sequence data, based on all polymorphisms as well as for a range of single nucleotide polymorphism (SNP) selection criteria, and finds that significance thresholds vary by a factor of >20 over the SNP selection criteria and statistical tests that it considers.
Estimation of the multiple testing burden for genomewide association studies of nearly all common variants
TLDR
The task of developing standards for genomewide significance is undertaken, based on data collected by the International Haplotype Map Consortium, and the sensitivity of the testing burden to the required significance level is identified.
A tutorial on statistical methods for population association studies
TLDR
An overview of statistical approaches to population association studies, including preliminary analyses (Hardy–Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association.
Genome-wide association studies: theoretical and practical concerns
TLDR
The main factors — including models of the allelic architecture of common diseases, sample size, map density and sample-collection biases — that need to be taken into account in order to optimize the cost efficiency of identifying genuine disease-susceptibility loci are outlined.
Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
TLDR
A more robust and fast method to calculate the effective number of independent markers (Me) for the adjustment of multiple testing and suggested the use of a p-value threshold of ~10−7 as the criterion for genome-wide significance for early commercial genotyping arrays.
Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies
TLDR
By approximating the effective number of independent SNPs across the genome the authors are able to 'correct' for a more accurate number of tests and develop 'LD adjusted' Bonferroni corrected p-value thresholds that account for the interdepdendence of SNPs on well-utilized commercially available SNP "chips".
Comparative study of statistical methods for detecting association with rare variants in exome-resequencing data
TLDR
The results show that collapsing methods are promising tools, but the type I error rate was not well controlled and the single-locus association approaches may not be affected to the same extent by population stratification.
Bayesian statistical methods for genetic association studies
TLDR
These methods are reviewed, focusing on single-SNP tests in genome-wide association studies, and the use of Bayesian methods for fine mapping in candidate regions is demonstrated, and guidance for refereeing manuscripts that contain Bayesian analyses is provided.
...
...