A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

@inproceedings{Negahban2009AUF,
  title={A unified framework for high-dimensional analysis of \$M\$-estimators with decomposable regularizers},
  author={Sahand N. Negahban and Pradeep Ravikumar and Martin J. Wainwright and Bin Yu},
  booktitle={NIPS},
  year={2009}
}
High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a… 

Figures from this paper

Structured Estimation In High-Dimensions

A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is presented and it is shown that the same underlying statistical structure can be exploited to prove global geometric convergence of the gradient descent procedure up to statistical accuracy.

Penalised robust estimators for sparse and high-dimensional linear models

A new class of robust M -estimators for performing simultaneous parameter estimation and variable selection in high-dimensional regression models and a fast accelerated proximal gradient algorithm, of coordinate descent type, is proposed and implemented for computing the estimates.

A general framework for high-dimensional estimation in the presence of incoherence

  • Yuxin ChenS. Sanghavi
  • Computer Science
    2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2010
This work provides a unified framework that broadly characterizes when incoherence will enable consistent estimation in the high-dimensional setting, and establishes that incoherence guarantees success in recovery for two broad classes of methods.

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

The theory of M-estimators is built on and adapted to handle the problems of high-dimensional regression and covariance matrix estimation via regularization and it is shown that penalized M-ESTimators for high- dimensional generalized linear models can lead to estimators that are consistent when the data is nice and contains no contaminated observations, while importantly remaining stable in the presence of a small fraction of outliers.

FASt global convergence of gradient methods for solving regularized M-estimation

We analyze the convergence rates of composite gradient methods for solving problems based on regularized M-estimators, working within a high-dimensional framework that allows the data dimension d to

Adaptive Estimation In High-Dimensional Additive Models With Multi-Resolution Group Lasso

A multi-resolution group Lasso (MR-GL) method is proposed in a unified approach to simultaneously achieve or improve existing error bounds and provide new ones without the knowledge of the level of sparsity or the degree of smoothness of the unknown functions.

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

Simulations show excellent agreement with the high-dimensional scaling of the error predicted by the theory, and illustrate their consequences for a number of specific learning models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes, and recovery of low- rank matrices from random projections.

Fast global convergence of gradient methods for high-dimensional statistical recovery

The theory guarantees that projected gradient descent has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter $\theta^*$ and an optimal solution $\hat{\theta}$.

Fast global convergence rates of gradient methods for high-dimensional statistical recovery

The theory guarantees that Nesterov's first-order method has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter θ* and the optimal solution ^θ.

Estimation of high-dimensional low-rank matrices

This work investigates penalized least squares estimators with a Schatten-p quasi-norm penalty term and derives bounds for the kth entropy numbers of the quasi-convex Schatten class embeddings S M p → S M 2 , p < 1, which are of independent interest.
...

References

SHOWING 1-10 OF 108 REFERENCES

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

Simulations show excellent agreement with the high-dimensional scaling of the error predicted by the theory, and illustrate their consequences for a number of specific learning models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes, and recovery of low- rank matrices from random projections.

Estimation of high-dimensional low-rank matrices

This work investigates penalized least squares estimators with a Schatten-p quasi-norm penalty term and derives bounds for the kth entropy numbers of the quasi-convex Schatten class embeddings S M p → S M 2 , p < 1, which are of independent interest.

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions

A general theorem is derived that bounds the Frobenius norm error for an estimate of the pair of high-dimensional matrix decomposition problems obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer.

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

Even though the Lasso cannot recover the correct sparsity pattern, the estimator is still consistent in the ‘2-norm sense for fixed designs under conditions on (a) the number sn of non-zero components of the vector n and (b) the minimal singular values of the design matrices that are induced by selecting of order sn variables.

High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence

The first result establishes consistency of the estimate b � in the elementwise maximum-norm, which allows us to derive convergence rates in Frobenius and spectral norms, and shows good correspondences between the theoretical predictions and behavior in simulations.

Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.

To guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor.

Restricted Eigenvalue Properties for Correlated Gaussian Designs

This paper proves directly that the restricted nullspace and eigenvalue conditions hold with high probability for quite general classes of Gaussian matrices for which the predictors may be highly dependent, and hence restricted isometry conditions can be violated with high probabilities.

Covariance regularization by thresholding

This paper considers regularizing a covariance matrix of $p$ variables estimated from $n$ observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm

Optimal selection of reduced rank estimators of high-dimensional matrices

A new criterion, the Rank Selection Criterion (RSC), is introduced, for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models, which has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems.

Consistency of the group Lasso and multiple kernel learning

  • F. Bach
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2008
This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
...