# High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

@inproceedings{Loh2011HighdimensionalRW,
title={High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity},
author={Po-Ling Loh and Martin J. Wainwright},
booktitle={NIPS},
year={2011}
}
• Published in NIPS 16 September 2011
• Computer Science
Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization…
456 Citations

## Figures from this paper

High-dimensional statistics with systematically corrupted data
New methods for obtaining high-dimensional regression estimators in the presence of corrupted data are presented, and theoretical guarantees for the statistical consistency of these methods are provided.
Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery
• Computer Science
ICML
• 2013
This paper develops a simple variant of orthogonal matching pursuit (OMP) for sparse regression, and shows that without knowledge of the noise covariance, the algorithm recovers the support, and provides matching lower bounds that show that the algorithm performs at the minimax optimal rate.
Orthogonal Matching Pursuit with Noisy and Missing Data: Low and High Dimensional Results
• Computer Science
ArXiv
• 2012
These efficient OMP-like algorithms are as efficient as OMP, and improve on the best-known results for missing and noisy data in regression, both in the high-dimensional setting where the authors seek to recover a sparse vector from only a few measurements, and in the classical low-dimensionalSetting where they recover an unstructured regressor.
Pattern alternating maximization algorithm for missing data in high-dimensional problems
• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2014
This work proposes a novel and efficient algorithm for maximizing the observed log-likelihood of a multivariate normal data matrix with missing values and shows that the new method often improves upon other modern imputation techniques such as k-nearest neighbors imputation, nuclear norm minimization or a penalized likelihood approach with an l1-penalty on the concentration matrix.
Convex and Non-convex Approaches for Statistical Inference with Noisy Labels
• Computer Science, Mathematics
• 2019
To the best of the knowledge, this is the first work providing point estimation guarantees and hypothesis testing results for GLMs with non-canonical link functions, which is of independent interest.
Minimax Rates of ℓp-Losses for High-Dimensional Linear Errors-in-Variables Models over ℓq-Balls
• Computer Science, Mathematics
Entropy
• 2021
The established lower and upper bounds on minimax risks agree up to constant factors when p=2, which together provide the information-theoretic limits of estimating a sparse vector in the high-dimensional linear errors-in-variables model.
Balanced estimation for high-dimensional measurement error models
• Computer Science
Comput. Stat. Data Anal.
• 2018
High Dimensional Structured Estimation with Noisy Designs
• Computer Science
SDM
• 2016
It is shown that without any information about the noise in covariates, currently established techniques of bounding statistical error of estimation fail to provide consistency guarantees, but when information about noise covariance is available or can be estimated, then consistency guarantees for any norm regularizer are proved.
Penalised robust estimators for sparse and high-dimensional linear models
• Computer Science
• 2020
A new class of robust M -estimators for performing simultaneous parameter estimation and variable selection in high-dimensional regression models and a fast accelerated proximal gradient algorithm, of coordinate descent type, is proposed and implemented for computing the estimates.
Statistical consistency and asymptotic normality for high-dimensional robust M-estimators
This work establishes a form of local statistical consistency for the penalized regression estimators under fairly mild conditions on the error distribution, and analysis of the local curvature of the loss function has useful consequences for optimization when the robust regression function and/or regularizer is nonconvex and the objective function possesses stationary points outside the local region.

## References

SHOWING 1-10 OF 37 REFERENCES
Fast global convergence of gradient methods for high-dimensional statistical recovery
• Computer Science
ArXiv
• 2011
The theory guarantees that projected gradient descent has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter $\theta^*$ and an optimal solution $\hat{\theta}$.
A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
• Computer Science, Mathematics
NIPS
• 2009
A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.
Estimation of (near) low-rank matrices with noise and high-dimensional scaling
• Computer Science
ICML
• 2010
Simulations show excellent agreement with the high-dimensional scaling of the error predicted by the theory, and illustrate their consequences for a number of specific learning models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes, and recovery of low- rank matrices from random projections.
Restricted Eigenvalue Properties for Correlated Gaussian Designs
• Computer Science
J. Mach. Learn. Res.
• 2010
This paper proves directly that the restricted nullspace and eigenvalue conditions hold with high probability for quite general classes of Gaussian matrices for which the predictors may be highly dependent, and hence restricted isometry conditions can be violated with high probabilities.
Missing values: sparse inverse covariance estimation and an extension to sparse regression
• Computer Science, Mathematics
Stat. Comput.
• 2012
An efficient EM algorithm for optimization with provable numerical convergence properties is proposed and the methodology to handle missing values in a sparse regression context is extended.
LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA
• Computer Science
• 2009
Even though the Lasso cannot recover the correct sparsity pattern, the estimator is still consistent in the ‘2-norm sense for fixed designs under conditions on (a) the number sn of non-zero components of the vector n and (b) the minimal singular values of the design matrices that are induced by selecting of order sn variables.
The sparsity and bias of the Lasso selection in high-dimensional linear regression
• Mathematics
• 2008
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent,
Sparse recovery under matrix uncertainty
• Mathematics, Computer Science
• 2010
New estimators called matrix uncertainty selectors (or, shortly, the MU-selectors) are suggested which are close to θ * in different norms and in the prediction risk if the restricted eigenvalue assumption on X is satisfied.
Improved Matrix Uncertainty Selector
• Computer Science, Mathematics
• 2013
This paper proposes a modiflcation of the MU selector when ¥ is a random matrix with zero-mean entries having the variances that can be estimated and shows that the new estimator called the Compensated MU selector achieves better accuracy of estimation than the original MU selector.
High-dimensional graphs and variable selection with the Lasso
• Computer Science
• 2006
It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.