Heritability estimation in high dimensional sparse linear mixed models

  title={Heritability estimation in high dimensional sparse linear mixed models},
  author={Anna Bonnet and Elisabeth Gassiat and C{\'e}line L{\'e}vy-Leduc},
  journal={Electronic Journal of Statistics},
Motivated by applications in genetic fields, we propose to estimate the heritability in high-dimensional sparse linear mixed models. The heritability determines how the variance is shared between the different random components of a linear mixed model. The main novelty of our approach is to consider that the random effects can be sparse, that is may contain null components, but we do not know either their proportion or their positions. The estimator that we consider is strongly inspired by the… 

Figures from this paper

Optimal Estimation of Genetic Relatedness in High-Dimensional Linear Models

ABSTRACT Estimating the genetic relatedness between two traits based on the genome-wide association data is an important problem in genetics research. In the framework of high-dimensional linear

Improving heritability estimation by a variable selection approach in sparse high dimensional linear mixed models

A novel methodology to estimate heritability, which corresponds to the proportion of phenotypic variance that can be explained by genetic factors, is proposed and implemented in the R package EstHer and applied on neuroanatomical data from the project IMAGEN.

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

This work reviews and compares several estimators of variances and of the random slopes and errors of high-dimensional linear regression models and demonstrates the superior accuracy of the resulting MML estimator of λ as compared to CV.

Boosting heritability: estimating the genetic component of phenotypic variation with multiple sample splitting

This paper proposes a generic strategy for heritability inference, termed as “boosting heritability”, by combining the advantageous features of different recent methods to produce an estimate of the heritability with a high-dimensional linear model.

The Mahalanobis kernel for heritability estimation in genome-wide association studies: fixed-effects and random-effects methods

It is shown that reliance on the Euclidean distance kernel contributes to several unresolved modeling inconsistencies in heritability estimation for GWAS and is proposed a new definition of partitioned heritability -- the heritability attributable to a subset of genes or single nucleotide polymorphisms -- using the Mahalanobis GRM, and it inherits many of the nice consistency properties identified in the original analysis.

Heritability estimation in case-control studies

The main result is the proof of the consistency of this estimator, under several assumptions that will state and discuss, and a numerical study to compare two approximations leading to two heritability estimators.

Statistical Inference for Genetic Relatedness Based on High-Dimensional Logistic Regression

The ability to obtain novel insights about the shared genetic architecture between ten pediatric autoimmune diseases is demonstrated to show the superiority of the proposed methods and their applicability to the analysis of real genetic data.

Fixed Effects Testing in High-Dimensional Linear Mixed Models

A hypothesis test and the corresponding p-value for testing for the significance of the homogeneous structure in linear mixed models are developed and a robust matching moment construction is used for creating a test that adapts to the size of the model sparsity.

EigenPrism: inference for high dimensional signal‐to‐noise ratios

A novel procedure is derived, called EigenPrism, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well and applies to a genetic data set to estimate the genetic signal‐to‐noise ratio for a number of continuous phenotypes.

A Unified Approach to Robust Inference for Genetic Covariance

The asymptotic properties of the proposed estimator are provided and it is shown that the proposal is robust under certain model misspecification and robust inference for the narrow-sense genetic covariance, even when both linear models are mis-specified.



Improving heritability estimation by a variable selection approach in sparse high dimensional linear mixed models

A novel methodology to estimate heritability, which corresponds to the proportion of phenotypic variance that can be explained by genetic factors, is proposed and implemented in the R package EstHer and applied on neuroanatomical data from the project IMAGEN.

Polygenic Modeling with Bayesian Sparse Linear Mixed Models

This work applies Bayesian sparse linear mixed model (BSLMM) and compares it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction, and demonstrates that BSLMM considerably outperforms either of the other two methods.

Accurate estimation of heritability in genome wide studies using random effects models

It is demonstrated that this method leads to more stable and accurate heritability estimation compared to the approach of Yang et al. (2010), and it also allows us to find ML estimates of the portion of markers which are causal, indicating whether the heritability stems from a small number of powerful genetic factors or a large number of less powerful ones.


It is proved that, with the proxy matrix appropriately chosen, the proposed procedure can identify all true random effects with asymptotic probability one, where the dimension of random effects vector is allowed to increase exponentially with the sample size.

Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies

A standard linear model with one additional random effect in situations where many predictors have been collected on the same subjects and each predictor is analyzed separately is considered, which has been successfully applied to a large-scale association study of multiple sclerosis.

Common SNPs explain a large proportion of the heritability for human height

Evidence is provided that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.

Bayesian variable selection regression for genome-wide association studies and other large-scale problems

The potential for BVSR to estimate the total proportion of variance in outcome explained by relevant covariates is emphasized, to shed light on the issue of "missing heritability" in genome-wide association studies.

GCTA: a tool for genome-wide complex trait analysis.

Genetics and Analysis of Quantitative Traits

This book discusses the genetic Basis of Quantitative Variation, Properties of Distributions, Covariance, Regression, and Correlation, and Properties of Single Loci, and Sources of Genetic Variation for Multilocus Traits.

Stability selection

It is proved for the randomized lasso that stability selection will be variable selection consistent even if the necessary conditions for consistency of the original lasso method are violated.