Corpus ID: 234094795

SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data

  title={SEAGLE: A Scalable Exact Algorithm for Large-Scale Set-Based GxE Tests in Biobank Data},
  author={Jocelyn T. Chi and Ilse C. F. Ipsen and Tzu-Hung Hsiao and Ching-Heng Lin and Li-San Wang and Wan-Ping Lee and Tzu-Pin Lu and Jung-Ying Tzeng},
The explosion of biobank data offers immediate opportunities for gene-environment (GxE) interaction studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which is a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of… Expand

Figures and Tables from this paper


Efficient gene-environment interaction tests for large biobank-scale sequencing studies
Efficient Mixed-model Association tests for GEne-Environment interactions (MAGEE), for testing GEI between an aggregate variant set and environmental exposures on quantitative and binary traits in large-scale sequencing studies with related individuals, is proposed. Expand
A scalable estimator of SNP heritability for biobank-scale data
A scalable randomized algorithm for estimating variance components in LMMs, based on a method‐of‐moment estimator that has a runtime complexity Symbol for N individuals and M SNPs and can reduce the time complexity to Symbol by leveraging the structure of the genotype matrix. Expand
A unified powerful set-based test for sequencing data analysis of GxE interactions.
A hierarchical model to jointly assess the GxE effects of a set of rare variants for example, in a gene or regulatory region, leveraging the information across the variants is proposed, and a novel testing procedure is developed by deriving two independent score statistics for the fixed effects and the variance component separately. Expand
A Fast Multiple‐Kernel Method With Applications to Detect Gene‐Environment Interaction
This work proposes a computationally efficient and statistically rigorous “fastKM” algorithm for multikernel analysis that is based on a low‐rank approximation to the nuisance effect kernel matrices and shows that it has similar performance to an EM‐based KM approach for quantitative traits while running much faster. Expand
The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses
GeneCards, the human gene compendium, enables researchers to effectively navigate and inter‐relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways and provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Expand
Test for interactions between a genetic marker set and environment in generalized linear models.
If the main effects of multiple SNPs in a set are associated with a disease/trait, the classical single SNP-GE interaction analysis can be biased and have an inflated Type 1 error rate, a computationally efficient and powerful gene-environment set association test (GESAT) in generalized linear models is proposed. Expand
FastSKAT: Sequence kernel association tests for very large sets of markers
The proposed fastSKAT is a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a Weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigen values. Expand
Complete Effect‐Profile Assessment in Association Studies With Multiple Genetic and Multiple Environmental Factors
The issues encountered in constructing kernels for investigating interactions between two factor‐sets are illustrated, and a simple yet intuitive solution to construct the G×E kernel that retains the ease‐of‐interpretation of classic regression is proposed. Expand
Quantification of the overall contribution of gene-environment interaction for obesity-related traits
A robust maximum likelihood method is proposed for estimating the overall statistical interaction between a genetic risk score for a continuous outcome and all environmental variables taking into account all interacting environmental variables, without the need to measure any. Expand
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases.
Current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations are highlighted. Expand