Correlated variables in regression: Clustering and sparse estimation

@article{Buhlmann2013CorrelatedVI,
  title={Correlated variables in regression: Clustering and sparse estimation},
  author={Peter Buhlmann and Philipp Rutimann and Sara A. van de Geer and Cun-Hui Zhang},
  journal={Journal of Statistical Planning and Inference},
  year={2013},
  volume={143},
  pages={1835-1858}
}
We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also… Expand

Figures and Tables from this paper

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models
TLDR
This paper introduces Adaptive Cluster Lasso method for variable selection in high dimensional sparse regression models with strongly correlated variables and shows that the procedure is consistent and efficient in finding true underlying population group structure. Expand
Regularization and Estimation in Regression with Cluster Variables
Clustering Lasso, a new regularization method for linear regressions is proposed in the paper. The Clustering Lasso can select variable while keeping the correlation structures among variables. InExpand
A Cluster Elastic Net for Multivariate Regression
TLDR
A coordinate descent algorithm for both the normal and binomial likelihood, which can easily be extended to other generalized linear model (GLM) settings, which is presented and presented. Expand
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
TLDR
This work considers variable selection problems in high dimensional sparse regression models with strongly correlated variables, and proposes to use Elastic-net as a pre-selection step for Cluster Lasso methods (i.e. Cluster Group Lasso and Cluster Representative Lasso). Expand
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
TLDR
This work presents a new variable selection procedure, Dual Lasso Selector, and argues that correlation among active predictors is not problematic, and derives a new weaker condition on the design matrix, called Pseudo Irrepresentable Condition (PIC). Expand
Stability Feature Selection using Cluster Representative LASSO
TLDR
This work proposes to cluster the variables first and then do stability feature selection using Lasso for cluster representatives and finds an optimal and consistent solution for group variable selection in high-dimensional regression setting. Expand
The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping
TLDR
This work proposes the cluster elastic net, which selectively shrinks the coefficients for such variables toward each other, rather than toward the origin, in the high-dimensional regression setting. Expand
Split Regularized Regression
TLDR
In general the proposed method improves the prediction accuracy of the base estimator used in the procedure and establishes the consistency of the method with the number of predictors possibly increasing with the sample size. Expand
Pre-processing with Orthogonal Decompositions for High-dimensional Explanatory Variables
Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO methodExpand
The cluster graphical lasso for improved estimation of Gaussian graphical models
TLDR
The cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphicalLasso on the subset of variables within each cluster, is proposed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
High-dimensional graphs and variable selection with the Lasso
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims atExpand
Mimeo Series No . 2583 Simultaneous regression shrinkage , variable selection and clustering of predictors with OSCAR
In this paper, a new method called the OSCAR (Octagonal Shrinkage and Clustering Algorithm for Regression) is proposed to simultaneously select variables and perform supervised clustering in theExpand
Sparse regression with exact clustering
This dissertation deals with three closely related topics of the lasso in addition to supplying a comprehensive overview of the rapidly growing literature in this field. The first part aims atExpand
Model selection and estimation in regression with grouped variables
Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactorExpand
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.
TLDR
A new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters, in addition to improving prediction accuracy and interpretation. Expand
Regression Shrinkage and Selection via the Lasso
SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than aExpand
The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.
TLDR
It is shown that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability in sparse, high-dimensional settings with p ≫ n under reasonable conditions. Expand
A Sparse-Group Lasso
For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparseExpand
The sparsity and bias of the Lasso selection in high-dimensional linear regression
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent,Expand
Relaxed Lasso
TLDR
It is shown that the contradicting demands of an efficient computational procedure and fast convergence rates of the `2-loss can be overcome by a two-stage procedure, termed the relaxed Lasso. Expand
...
1
2
3
4
5
...