The Dantzig selector: Statistical estimation when P is much larger than n

@article{Cands2007TheDS,
  title={The Dantzig selector: Statistical estimation when P is much larger than n},
  author={Emmanuel J. Cand{\`e}s and Terence Tao},
  journal={Quality Engineering},
  year={2007},
  volume={54},
  pages={83-84}
}
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=Xβ+z, where β∈Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n≪p, and the zi’s are i.i.d. N(0, σ^2). Is it possible to estimate β reliably based on the noisy data y? 

Figures and Tables from this paper

Discussion: The Dantzig selector: Statistical estimation when p is much larger than n

TLDR
The conditions of this paper using the Dantzig selector and those of Bunea, Tsybakov and Wegkamp using the Lasso are presented together, since these authors emphasize different points and use different normalizations.

Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation

TLDR
The multi-step thresholding procedure can accurately estimate a sparse vector β ∈ ℝp in a linear model, under the restricted eigenvalue conditions, and if X obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, the Gauss-Dantzig selector achieves the l2 loss within a logarithmic factor of the ideal mean square error.

DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS.

TLDR
This paper addresses the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible.

Statistical Estimation in High Dimension, Sparsity and Oracle Inequalities

TLDR
This work studies the statistical properties of two types of procedures: the penalized risk minimization procedures with a penalty term on the set of potential parameters and the exponential weights procedures and establishes oracle inequalities for the L^{\pi}$ norm, $1\leqslant \pi \leqSlant \infty$.

SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR1 BY PETER J. BICKEL, YA’ACOV RITOV

TLDR
It is shown that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior, and oracle inequalities for the prediction risk in the general nonparametric regression model and bounds on the p estimation loss for 1 ≤ p ≤ 2 in the linear model are derived.

SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR

We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk

Distributed Estimation and Inference with Statistical Guarantees

TLDR
This paper addresses the important question of how to choose k as n grows large, providing a theoretical upper bound on k such that the information loss due to the divide and conquer algorithm is negligible.

J an 2 00 8 Near-ideal model selection by l 1 minimization

TLDR
It is proved that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error one would achieve with an oracle supplying perfect information about which variables should be included in the model and which variable should not.

Statistical Optimization in High Dimensions

TLDR
This work proposes three algorithms to address the high-dimensional regime, where the number of samples is roughly equal to the dimensionality of the problem, and the noise magnitude may greatly exceed the magnitude of the signal itself.

Near-ideal model selection by ℓ1 minimization

TLDR
It is proved that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error that one would achieve with an oracle supplying perfect information about which variables should and should not be included in the model.
...

References

SHOWING 1-10 OF 117 REFERENCES

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

TLDR
Even though the Lasso cannot recover the correct sparsity pattern, the estimator is still consistent in the ‘2-norm sense for fixed designs under conditions on (a) the number sn of non-zero components of the vector n and (b) the minimal singular values of the design matrices that are induced by selecting of order sn variables.

Sparsity oracle inequalities for the Lasso

TLDR
It is shown that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of non-zero components of the oracle vector, in nonparametric regression setting with random design.

Persistence in high-dimensional linear predictor selection and the virtue of overparametrization

TLDR
Under various sparsity assumptions on the optimal predictor there is “asymptotically no harm” in introducing many more explanatory variables than observations, and such practice can be beneficial in comparison with a procedure that screens in advance a small subset of explanatory variables.

Asymptotics for lasso-type estimators

We consider the asymptotic behavior of regression estimators that minimize the residual sum of squares plus a penalty proportional to Σ ∥β j ∥γ for some y > 0. These estimators include the Lasso as a

From Model Selection to Adaptive Estimation

TLDR
Many different model selection information criteria can be found in the literature in various contexts including regression and density estimation to select among a given collection of parametric models that model which minimizes an empirical loss plus some penalty term which is proportional to the dimension of the model.

A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION

TLDR
It is shown that for any sample size n, when there are superuous variables in the linear regression model and the design matrix is orthogonal, the probability that these procedures correctly identify the true set of important variables is less than a constant not depending on n.

On Model Selection Consistency of Lasso

TLDR
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.

Sparsity and incoherence in compressive sampling

TLDR
It is shown that ℓ1 minimization recovers x0 exactly when the number of measurements exceeds S, and μ is the largest entry in U properly normalized: .

Boosting for high-dimensional linear models

We prove that boosting with the squared error loss, L 2 Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast

Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting

  • M. Wainwright
  • Computer Science
    IEEE Transactions on Information Theory
  • 2009
TLDR
For a noisy linear observation model based on random measurement matrices drawn from general Gaussian measurementMatrices, this paper derives both a set of sufficient conditions for exact support recovery using an exhaustive search decoder, as well as aset of necessary conditions that any decoder must satisfy for exactSupport set recovery.
...