• Corpus ID: 248693135

A zero-estimator approach for estimating the signal level in a high-dimensional model-free setting

  title={A zero-estimator approach for estimating the signal level in a high-dimensional model-free setting},
  author={Ilan Livne and David Azriel and Yair Goldberg},
We study a high-dimensional regression setting under the assumption of known covariate distribution. We aim at estimating the amount of explained variation in the response by the best linear function of the covariates (the signal level). In our setting, neither sparsity of the coefficient vector, nor normality of the covariates or linearity of the conditional expectation are assumed. We present an unbiased and consistent estimator and then improve it by using a zero-estimator approach, where a… 

Figures and Tables from this paper



Improved estimators for semi-supervised high-dimensional regression model

An estimator is proposed, which is unbiased, consistent, and asymptotically normal, and can be improved by adding zero-estimators arising from the unlabelled data, and an algorithm is presented that improves estimation for any given variance estimator.

Adaptive estimation of high-dimensional signal-to-noise ratios

This work considers the equivalent problems of estimating the residual variance, the proportion of explained variance $\eta$ and the signal strength in a high-dimensional linear regression model with Gaussian random design and builds an adaptive procedure whose convergence rate achieves the minimax risk over all up to a logarithmic loss.

The conditionality principle in high-dimensional regression

Consider a high-dimensional linear regression problem, where the number of covariates is larger than the number of observations and the interest is in estimating the conditional variance of the

Variance estimation in high-dimensional linear models

The residual variance and the proportion of explained variation are important quantities in many statistical models and model fitting procedures. They play an important role in regression diagnostics

Scaled sparse linear regression

Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating

Statistical Inference on Explained Variation in High-dimensional Linear Model with Dense Effects

This paper proposes an estimating equation approach to the estimation and inference on the explained variation in the high-dimensional linear model and shows that the proposed estimator is consistent and asymptotically normally distributed under reasonable conditions.

High-dimensional semi-supervised learning: in search of optimal inference of the mean

A novel k-fold cross-fitted, double robust estimator is illustrated, particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semiparametric models, and applies to the heterogeneous treatment effects.

EigenPrism: inference for high dimensional signal‐to‐noise ratios

A novel procedure is derived, called EigenPrism, which is asymptotically correct when the covariates are multivariate Gaussian and produces valid confidence intervals in finite samples as well and applies to a genetic data set to estimate the genetic signal‐to‐noise ratio for a number of continuous phenotypes.

Variance estimation using refitted cross‐validation in ultrahigh dimensional regression

A two‐stage refitted procedure via a data splitting technique, called refitted cross‐validation, to attenuate the influence of irrelevant variables with high spurious correlations is proposed and results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function.

Semisupervised inference for explained variance in high dimensional linear regression and its applications

  • T. Tony CaiZijian Guo
  • Computer Science, Mathematics
    Journal of the Royal Statistical Society: Series B (Statistical Methodology)
  • 2020
It is shown that the estimator achieves the minimax optimal rate of convergence in the general semisupervised framework, and the optimality result characterizes how the unlabelled data contribute to the estimation accuracy.