Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders

  title={Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders},
  author={Muhammad Ammar Malik and Tom Michoel},
  journal={G3: Genes|Genomes|Genetics},
Random effect models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effect models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting… 


A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies
VBQTL is presented, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors and is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternatives.
Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies
PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis, consistently performs better than alternative methods, and finds in particular substantially more trans regulators.
Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses
We present PEER (probabilistic estimation of expression residuals), a software package implementing statistical models that improve the sensitivity and interpretability of genetic associations in
Variance component model to account for sample structure in genome-wide association studies
A variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours is reported.
Correction for hidden confounders in the genetic analysis of gene expression
A statistical model is presented that jointly corrects for two particular kinds of hidden structure—population structure (e.g., race, family-relatedness), and microarray expression artifacts (eg., batch effects), when these confounders are unknown.
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
This work introduces “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies and shows that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
Accurate Discovery of Expression Quantitative Trait Loci Under Confounding From Spurious and Genuine Regulatory Hotspots
Applying the intersample correlation emended (ICE) eQTL mapping method to mouse, yeast, and human identifies many more cis associations while eliminating most of the spurious trans associations, demonstrating the higher accuracy of the method to identify real genetic effects.
Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models
A novel probabilistic interpretation of principal component analysis (PCA) that is based on a Gaussian process latent variable model (GP-LVM), and related to popular spectral techniques such as kernel PCA and multidimensional scaling.
Expression reflects population structure
High dimensional, multi-modal genomics datasets are becoming increasingly common, which warrants investigation into analysis techniques that can reveal structure in the data without over-fitting, and the coupling of principal component analysis to canonical correlation analysis offers an efficient approach to exploratory analysis of this kind of data.
Genome-wide Efficient Mixed Model Analysis for Association Studies
This method is approximately n times faster than the widely used exact method known as efficient mixed-model association (EMMA), where n is the sample size, making exact genome-wide association analysis computationally practical for large numbers of individuals.