# Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.

@article{Bondell2008SimultaneousRS, title={Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.}, author={Howard D. Bondell and Brian J. Reich}, journal={Biometrics}, year={2008}, volume={64 1}, pages={ 115-23 } }

Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover…

## 417 Citations

Regression shrinkage and grouping of highly correlated predictors with HORSES

- Mathematics
- 2013

Identifying homogeneous subgroups of variables can be challenging in high dimensional data analysis with highly correlated predictors. We propose a new method called Hexagonal Operator for Regression…

Regularization and Estimation in Regression with Cluster Variables

- Mathematics
- 2014

Clustering Lasso, a new regularization method for linear regressions is proposed in the paper. The Clustering Lasso can select variable while keeping the correlation structures among variables. In…

The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping

- Mathematics, Computer ScienceTechnometrics
- 2014

This work proposes the cluster elastic net, which selectively shrinks the coefficients for such variables toward each other, rather than toward the origin, in the high-dimensional regression setting.

Consistent Group Identification and Variable Selection in Regression With Correlated Predictors

- Computer Science, MedicineJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2013

A penalization procedure is proposed that performs variable selection while clustering groups of predictors automatically, and compares favorably with existing selection approaches in both prediction accuracy and model discovery, while retaining its computational efficiency.

A Bayesian Approach to Multicollinearity and the Simultaneous Selection and Clustering of Predictors in Linear Regression

- Mathematics
- 2011

High correlation among predictors has long been an annoyance in regression analysis. The crux of the problem is that the linear regression model assumes each predictor has an independent effect on…

High-Dimensional Regression and Variable Selection Using CAR Scores

- Mathematics
- 2011

Variable selection is a difficult problem that is particularly challenging in the analysis of high-dimensional genomic data. Here, we introduce the CAR score, a novel and highly effective criterion…

MCEN: a method of simultaneous variable selection and clustering for high-dimensional multinomial regression

- Computer ScienceStat. Comput.
- 2020

A novel penalty function that incorporates both regression coefficients and pairwise correlation to define clusters of variables is used and provides a one-stop solution to select and group important variables associated with different classes of multinomial response at the same time.

Penalized regression combining the L1 norm and a correlation based penalty.

- Mathematics
- 2014

We consider the problem of feature selection in linear regression model with p covariates and n observations. We propose a new method to simultaneously select variables and favor a grouping effect,…

An extended variable inclusion and shrinkage algorithm for correlated variables

- Mathematics, Computer ScienceComput. Stat. Data Anal.
- 2013

A new method is proposed to simultaneously select variables and encourage a grouping effect where strongly correlated predictors tend to be in or out of the model together, which is capable of selecting a sparse model while avoiding the overshrinkage of a Lasso-type estimator.

Group variable selection for data with dependent structures

- Mathematics
- 2012

Variable selection methods have been widely used in the analysis of high-dimensional data, for example, gene expression microarray data and single nucleotide polymorphism data. A special feature of…

## References

SHOWING 1-10 OF 20 REFERENCES

Model selection and estimation in regression with grouped variables

- Mathematics
- 2006

Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor…

Regression Shrinkage and Selection via the Lasso

- Mathematics
- 1996

SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a…

Finding predictive gene groups from microarray data

- Mathematics
- 2004

Microarray experiments generate large datasets with expression values for thousands of genes, but not more than a few dozens of samples. A challenging task with these data is to reveal groups of…

Simultaneous Gene Clustering and Subset Selection for Sample Classification Via MDL

- Mathematics, Computer ScienceBioinform.
- 2003

An algorithm for the simultaneous clustering of genes and subset selection of gene clusters for sample classification is presented and a new model selection criterion based on Rissanen's MDL (minimum description length) principle is developed.

Sparsity and smoothness via the fused lasso

- Mathematics
- 2005

Summary. The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients…

Regularization and variable selection via the elastic net

- Mathematics
- 2005

Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a…

Supervised harvesting of expression trees

- Biology, MedicineGenome Biology
- 2000

It is found that the procedure may require a large number of experimental samples to successfully discover interactions, and is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worthy of further investigation.

Averaged gene expressions for regression.

- Mathematics, MedicineBiostatistics
- 2007

By averaging the genes within the clusters obtained from hierarchical clustering, supergenes are defined and used to fit regression models, thereby attaining concise interpretation and accuracy in regression of DNA microarray data.

Ridge regression: biased estimation for nonorthogonal problems

- Mathematics
- 2000

In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are…

Piecewise linear regularized solution paths

- Mathematics
- 2007

We consider the generic regularized optimization problem β(λ) = argminβ L(y, Xβ) + λJ(β). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407-499] have shown that for the LASSO-that…