# Correlated variables in regression: Clustering and sparse estimation

@article{Buhlmann2013CorrelatedVI, title={Correlated variables in regression: Clustering and sparse estimation}, author={Peter Buhlmann and Philipp Rutimann and Sara A. van de Geer and Cun-Hui Zhang}, journal={Journal of Statistical Planning and Inference}, year={2013}, volume={143}, pages={1835-1858} }

We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also… Expand

#### Figures and Tables from this paper

#### 152 Citations

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

- Computer Science, Mathematics
- ArXiv
- 2016

This paper introduces Adaptive Cluster Lasso method for variable selection in high dimensional sparse regression models with strongly correlated variables and shows that the procedure is consistent and efficient in finding true underlying population group structure. Expand

Regularization and Estimation in Regression with Cluster Variables

- Mathematics
- 2014

Clustering Lasso, a new regularization method for linear regressions is proposed in the paper. The Clustering Lasso can select variable while keeping the correlation structures among variables. In… Expand

A Cluster Elastic Net for Multivariate Regression

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2017

A coordinate descent algorithm for both the normal and binomial likelihood, which can easily be extended to other generalized linear model (GLM) settings, which is presented and presented. Expand

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

- Computer Science, Mathematics
- DKB/KIK@KI
- 2017

This work considers variable selection problems in high dimensional sparse regression models with strongly correlated variables, and proposes to use Elastic-net as a pre-selection step for Cluster Lasso methods (i.e. Cluster Group Lasso and Cluster Representative Lasso). Expand

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

- Mathematics, Computer Science
- DKB/KIK@KI
- 2017

This work presents a new variable selection procedure, Dual Lasso Selector, and argues that correlation among active predictors is not problematic, and derives a new weaker condition on the design matrix, called Pseudo Irrepresentable Condition (PIC). Expand

Stability Feature Selection using Cluster Representative LASSO

- Mathematics, Computer Science
- ICPRAM
- 2016

This work proposes to cluster the variables first and then do stability feature selection using Lasso for cluster representatives and finds an optimal and consistent solution for group variable selection in high-dimensional regression setting. Expand

The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping

- Mathematics, Computer Science
- Technometrics
- 2014

This work proposes the cluster elastic net, which selectively shrinks the coefficients for such variables toward each other, rather than toward the origin, in the high-dimensional regression setting. Expand

Split Regularized Regression

- Computer Science, Mathematics
- Technometrics
- 2020

In general the proposed method improves the prediction accuracy of the base estimator used in the procedure and establishes the consistency of the method with the number of predictors possibly increasing with the sample size. Expand

Pre-processing with Orthogonal Decompositions for High-dimensional Explanatory Variables

- Mathematics
- 2021

Strong correlations between explanatory variables are problematic for high-dimensional regularized regression methods. Due to the violation of the Irrepresentable Condition, the popular LASSO method… Expand

The cluster graphical lasso for improved estimation of Gaussian graphical models

- Computer Science, Mathematics
- Comput. Stat. Data Anal.
- 2015

The cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphicalLasso on the subset of variables within each cluster, is proposed. Expand

#### References

SHOWING 1-10 OF 42 REFERENCES

High-dimensional graphs and variable selection with the Lasso

- Mathematics
- 2006

The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at… Expand

Mimeo Series No . 2583 Simultaneous regression shrinkage , variable selection and clustering of predictors with OSCAR

- 2006

In this paper, a new method called the OSCAR (Octagonal Shrinkage and Clustering Algorithm for Regression) is proposed to simultaneously select variables and perform supervised clustering in the… Expand

Sparse regression with exact clustering

- Mathematics
- 2008

This dissertation deals with three closely related topics of the lasso in addition to supplying a comprehensive overview of the rapidly growing literature in this field.
The first part aims at… Expand

Model selection and estimation in regression with grouped variables

- Mathematics
- 2006

Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor… Expand

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.

- Mathematics, Medicine
- Biometrics
- 2008

A new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters, in addition to improving prediction accuracy and interpretation. Expand

Regression Shrinkage and Selection via the Lasso

- Mathematics
- 1996

SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a… Expand

The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.

- Medicine, Mathematics
- Annals of statistics
- 2011

It is shown that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability in sparse, high-dimensional settings with p ≫ n under reasonable conditions. Expand

A Sparse-Group Lasso

- Mathematics
- 2013

For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparse… Expand

The sparsity and bias of the Lasso selection in high-dimensional linear regression

- Mathematics
- 2008

Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent,… Expand

Relaxed Lasso

- Computer Science
- Comput. Stat. Data Anal.
- 2007

It is shown that the contradicting demands of an efficient computational procedure and fast convergence rates of the `2-loss can be overcome by a two-stage procedure, termed the relaxed Lasso. Expand