Variance component model to account for sample structure in genome-wide association studies

  title={Variance component model to account for sample structure in genome-wide association studies},
  author={Hyun Min Kang and Jae Hoon Sul and Susan K. Service and Noah A. Zaitlen and Sit-yee Kong and Nelson B. Freimer and Chiara Sabatti and Eleazar Eskin},
  journal={Nature Genetics},
Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to… 
A resource-efficient tool for mixed model association analysis of large-scale data
An MLM-based tool (fastGWA) is developed that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data.
Robust relationship inference in genome-wide association studies
A rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of unknown population substructure and performs relationship inference on millions of pairs of individuals in a matter of minutes, dozens of times faster than the most efficient existing algorithm.
A mixed-model approach for genome-wide association studies of correlated traits in structured populations
This work extends this linear mixed-model approach to carry out GWAS of correlated phenotypes, deriving a fully parameterized multi-Trait mixed model (MTMM) that considers both the within-trait and between-traits variance components simultaneously for multiple traits.
Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models
It is shown both analytically and empirically that population structure may cause spurious GEIs and a statistical approach based on mixed models to account for population structure on GEI statistics is proposed, and it is found that the approach effectively controls populationructure on statistics for GEIs as well as for genetic variants.
Fast and flexible linear mixed models for genome-wide genetics
Improved accuracy for genetic association tests, increased power to discover causal genetic variants, and the ability to provide accurate summaries of model uncertainty using both simulated and real data examples are demonstrated.
FDR control in GWAS with population structure
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing distinct and interpretable discoveries while controlling the false
Genetic Studies: The Linear Mixed Models in Genome-wide Association Studies
Current literatures dealing with sample structure are summarized, and a review focuses on the following four areas: The approaches handling population structure in genome-wide association studies; The linear mixed model based approaches with the advantage of capturing multilevel relatedness, and the unsolved issues and future work of linear mixed models based approaches.
A Lasso multi-marker mixed model for association mapping with population structure correction
This work proposes linear mixed models LMM-Lasso, a mixed model that allows for both multi-locus mapping and correction for confounding effects, and simultaneously discovers likely causal variants and allows for multi-marker-based phenotype prediction from genotype.
Rapid variance components–based method for whole-genome association analysis
Simulations suggest that GRAMMAR-Gamma may be used for association testing in whole-genome resequencing studies of large human cohorts, and has a power close to that of the likelihood ratio test–based method.


Family-based association tests for genomewide association scans.
A computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans and allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited.
Association studies for quantitative traits in structured populations
This report generalizes genomic control to quantitative traits (QT) and multilocus models, and shows that GC controls spurious associations in reasonable settings of population substructure for QT models, including gene‐gene interaction.
Principal components analysis corrects for stratification in genome-wide association studies
This work describes a method that enables explicit detection and correction of population stratification on a genome-wide scale and uses principal components analysis to explicitly model ancestry differences between cases and controls.
Association mapping in structured populations.
This article describes a novel, statistically valid, method for case-control association studies in structured populations that uses a set of unlinked genetic markers to infer details of population structure, and to estimate the ancestry of sampled individuals, before using this information to test for associations within subpopulations.
Genome-wide association analysis of metabolic traits in a birth cohort from a founder population
The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
Genomic Control for Association Studies
The performance of the genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
Genome-wide strategies for detecting multiple loci that influence complex diseases
Analytical methods that explicitly look for statistical interactions between loci are shown to be computationally feasible, even for studies of hundreds of thousands of loci, and to be more powerful than traditional analyses under a range of models for interlocus interactions.
A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
A unified mixed-model approach to account for multiple levels of relatedness simultaneously as detected by random genetic markers is developed and provides a powerful complement to currently available methods for association mapping.
Genotype‐based matching to correct for population stratification in large‐scale case‐control genetic association studies
Through computer simulation, it is shown that GSM correctly controls false‐positive rates and improves power to detect true disease predisposing variants and compares GSM to genomic control using computer simulations, and finds improved power using GSM.
The genome-wide patterns of variation expose significant substructure in a founder population.