Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure

  title={Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling and population structure},
  author={Yik Ying Teo},
  journal={Current Opinion in Lipidology},
  • Y. Teo
  • Published 1 April 2008
  • Biology
  • Current Opinion in Lipidology
Purpose of review Genetic association studies which survey the entire genome have become a common design for uncovering the genetic basis of common diseases, including lipid-related traits. Such studies have identified several novel loci which influence blood lipids. The present review highlights the statistical challenges associated with such large-scale genetic studies and discusses the available methodological strategies for handling these issues. Recent findings The successful analysis of… 
It is suggested that regions with more variability might have structural characteristics that made them difficult to be scanned during the genotyping process, and the researchers could lower their false positive rates to avoid inaccurately significant levels.
Evaluating variations of genotype calling: a potential source of spurious associations in genome-wide association studies
Comparative analysis of the calling results and the corresponding lists of significantly associatedSNPs identified through association analysis revealed that algorithmic parameters used in BRLMM affected the genotype calls and the significantly associated SNPs.
Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis
  • P. Bagos
  • Biology
    Statistical applications in genetics and molecular biology
  • 2013
This review presents a comprehensive summary of the statistical methods used for robust analysis and genetic model selection in GAS and GWAS and discusses the application of such methods in the context of meta-analysis.
Planning and executing a genome wide association study (GWAS).
This chapter focuses on a number of key elements that require consideration for the successful conduct of a genome wide association studies (GWAS), and reflects on ethical considerations, study design, selection of phenotype/s, power considerations, sample tracking and storage issues, and genotyping product selection.
Genome‐wide Association Studies
The genome-wide association approach has become a reality as a result of significant advances in genomic resources such as the International HapMap Project, the high-throughput genotyping
Genome-wide Association: From Confounded to Confident
  • J. Glessner, H. Hakonarson
  • Biology
    The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry
  • 2011
Proper and responsible study design, followed by rigorous data quality assessment of genomic matching of cases and controls, is most likely to uncover regions of significant association that replicate in independent cohorts, thereby maximizing the chance of significant and confident association.
Evaluation of Clustering and Genotype Distribution for Replication in Genome Wide Association Studies: The Age-Related Eye Disease Study
Characteristics of the genomic loci associated with a trait could be used to identify initial associations with a higher chance of replication in a second cohort to assess correlation between traits and DNA sequence variation using large numbers of genetic variants.
On Quality Control Measures in Genome-Wide Association Studies: A Test to Assess the Genotyping Quality of Individual Probands in Family-Based Association Studies and an Application to the HapMap Data
A transmission test based on allele transmissions in pedigrees that identifies probands with insufficient genotyping quality that were not removed by standard quality control filtering and is ideally suited as the final layer of quality control filters in the cleaning process of genome-wide association studies.
Comparing Four Genome-Wide Association Study (GWAS) Programs with Varied Input Data Quantity
This study investigates how input data quantity influences output of four widely used GWAS programs, PLINK, TASSEL, GAPIT, and FaST-LMM and provides guidance on selectingGWAS programs when varied experimental data is present and on selecting significant SNPs for subsequent study.
Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure
The main characteristics of R packages, command-line tools and desktop applications, both free and commercial, are described to help make the most of a large amount of publicly available SNP data.


Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies
It is demonstrated that the alternative strategy of jointly analyzing the data from both stages almost always results in increased power to detect genetic association, despite the need to use more stringent significance levels, even when effect sizes differ between the two stages.
The effects of human population structure on large genetic association studies
The consequences of population structure on association outcomes increase markedly with sample size, and one method for correcting for population structure (Genomic Control) is examined, which may not correct for structure if too few loci are used and may overcorrect in other settings, leading to substantial loss of power.
Efficiency and power in genetic association studies
A haplotype-based tagging method is demonstrated that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency, and is robust to the completeness of the reference panel from which tags are selected.
An R Package for Analysis of Whole-Genome Association Studies
Data classes in which each genotype call is stored as a single byte are implemented to facilitate the analysis of whole genome association studies in the R language for statistical computing.
A new multipoint method for genome-wide association studies by imputation of genotypes
This work proposes a coherent analysis framework that treats the genome-wide association problem as one involving missing or uncertain genotypes, and proposes a model-based imputation method for inferring genotypes at observed or unobserved SNPs, leading to improved power over existing methods for multipoint association mapping.
Genome-wide strategies for detecting multiple loci that influence complex diseases
Analytical methods that explicitly look for statistical interactions between loci are shown to be computationally feasible, even for studies of hundreds of thousands of loci, and to be more powerful than traditional analyses under a range of models for interlocus interactions.
Association mapping in structured populations.
This article describes a novel, statistically valid, method for case-control association studies in structured populations that uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations.
Genomic Control for Association Studies
The performance of the genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
Evaluating coverage of genome-wide association studies
It is shown that although many of them provide substantial coverage of common variation in non-African populations, the precise extent is strongly dependent on the frequencies of alleles of interest and on specific considerations of study design.