• Publications
  • Influence
On Model Selection Consistency of Lasso
  • P. Zhao, Bin Yu
  • Mathematics, Computer Science
  • J. Mach. Learn. Res.
  • 1 December 2006
It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large. Expand
A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results. Expand
High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence
Given i.i.d. observations of a random vector X 2 R p , we study the problem of estimating both its covariance matrix � ∗ , and its inverse covariance or concentration matrix � ∗ = (� ∗ ) −1 . WeExpand
The Minimum Description Length Principle in Coding and Modeling
The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. Expand
Spectral clustering and the high-dimensional stochastic blockmodel
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social ne tworks, representing people who communicate with each other, areExpand
The Lasso [28] is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables p is potentially much larger than the number ofExpand
Model Selection and the Principle of Minimum Description Length
This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDLExpand
Analyzing Bagging
Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set problems. Here we formalizeExpand
Statistical guarantees for the EM algorithm: From population to sample-based analysis
A general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM and consequences of the general theory for three canonical examples of incomplete-data problems are developed. Expand