• Corpus ID: 246210238

Statistical Inference on Explained Variation in High-dimensional Linear Model with Dense Effects

  title={Statistical Inference on Explained Variation in High-dimensional Linear Model with Dense Effects},
  author={Hua Yun Chen},
  • H. Y. Chen
  • Published 18 January 2022
  • Computer Science
Statistical inference on the explained variation of an outcome by a set of covariates is of particular interest in practice. When the covariates are of moderate to highdimension and the effects are not sparse, several approaches have been proposed for estimation and inference. One major problem with the existing approaches is that the inference procedures are not robust to the normality assumption on the covariates and the residual errors. In this paper, we propose an estimating equation… 

Statistical Methods for Assessing the Explained Variation of a Health Outcome by a Mixture of Exposures

Exposures to environmental pollutants are often composed of mixtures of chemicals that can be highly correlated because of similar sources and/or chemical structures. The effect of an individual

Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods

37 new methods from PRIME projects are reviewed and summarized to enable more informed analyses of environmental mixtures and stress training for early career scientists as well as innovation in statistical methodology as an ongoing need.

A zero-estimator approach for estimating the signal level in a high-dimensional model-free setting

This work presents an unbiased and consistent estimator and then improves it by using a zero-estimator approach, where aZero estimator is a statistic whose expected value is zero.



Variance estimation in high-dimensional linear models

The residual variance and the proportion of explained variation are important quantities in many statistical models and model fitting procedures. They play an important role in regression diagnostics

Maximum Likelihood for Variance Estimation in High-Dimensional Linear Models

We study maximum likelihood estimators (MLEs) for the residual variance, the signalto-noise ratio, and other variance parameters in high-dimensional linear models. These parameters are essential in

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

It is shown that the estimator achieves the minimax optimal rate of convergence in the general semi-supervised framework and the limiting distribution for the proposed estimator is established and data-driven confidence intervals for the explained variance are constructed.

Fast and Accurate Construction of Confidence Intervals for Heritability.

EigenPrism: inference for high dimensional signal‐to‐noise ratios

A novel procedure is derived, called EigenPrism, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well and applies to a genetic data set to estimate the genetic signal‐to‐noise ratio for a number of continuous phenotypes.

Optimal Estimation of Co-heritability in High-dimensional Linear Models

Co-heritability is an important concept that characterizes the genetic associations within pairs of quantitative traits. There has been significant recent interest in estimating the co-heritability

Variance estimation using refitted cross‐validation in ultrahigh dimensional regression

A two‐stage refitted procedure via a data splitting technique, called refitted cross‐validation, to attenuate the influence of irrelevant variables with high spurious correlations is proposed and results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function.

Adaptive estimation of high-dimensional signal-to-noise ratios

This work considers the equivalent problems of estimating the residual variance, the proportion of explained variance $\eta$ and the signal strength in a high-dimensional linear regression model with Gaussian random design and builds an adaptive procedure whose convergence rate achieves the minimax risk over all up to a logarithmic loss.

Common SNPs explain a large proportion of the heritability for human height

Evidence is provided that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.

On asymptotics of eigenvectors of large sample covariance matrix

Let {X ij }, i, j = ..., be a double array of i.i.d. complex random variables with EX 11 = 0, E|X 11 | 2 = 1 and E|X 11 | 4 <∞, and let An = (1 N T 1/2 n X n X* n (T 1/2 n , where T 1/2 n is the