• Corpus ID: 219179362

Feature-weighted elastic net: using "features of features" for better prediction

  title={Feature-weighted elastic net: using "features of features" for better prediction},
  author={J. Kenneth Tay and Nima Aghaeepour and Trevor J. Hastie and Robert Tibshirani},
In some supervised learning settings, the practitioner might have additional information on the features used for prediction. We propose a new method which leverages this additional information for better prediction. The method, which we call the feature-weighted elastic net ("fwelnet"), uses these "features of features" to adapt the relative penalties on the feature coefficients in the elastic net penalty. In our simulations, fwelnet outperforms the lasso in terms of test mean squared error… 

Figures from this paper

Fast marginal likelihood estimation of penalties for group-adaptive elastic net
A fast method for marginal likelihood estimation of group-adaptive elastic net penalties for generalised linear models that substantially decreases computation time and outperforms or matches other methods by learning from co-data is presented.
Characterization of the treatment-naive immune microenvironment in melanoma with BRAF mutation
Treatment-naive BRAF-mutant melanoma has a distinct immune context compared with BRAf-wt melanoma, with significantly decreased CD8+ T cells and increased B cells and CD4- T cells in the tumor microenvironment.
ecpc: An R-package for generic co-data models for high-dimensional prediction
High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior
Group-regularized ridge regression via empirical Bayes noise level cross-validation.
Features in predictive models are not exchangeable, yet common supervised models treat them as such. Here we study ridge regression when the analyst can partition the features into $K$ groups based


Regularization and variable selection via the elastic net
It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes
This work presents a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group, and extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.
Feature selection guided by structural information
This paper proposes to extend the elastic net by admitting general nonnegative quadratic constraints as second form of regularization, and provides an analog to the so-called ’irrepresentable condition’ which holds for the lasso.
Weighted Lasso with Data Integration
Through simulations, it is shown that the weighted lasso with integrated relevant external information on the covariates outperforms the lasso and the adaptive lasso when the external information is from relevant to partly relevant, in terms of both variable selection and prediction.
Better prediction by use of co‐data: adaptive group‐regularized ridge regression
A method for adaptive group‐regularized (logistic) ridge regression, which makes structural use of ‘co‐data’, which improves the predictive performances of ordinary logistic ridge regression and the group lasso and derives empirical Bayes estimates of group‐specific penalties.
Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms
Simulated and real data examples demonstrate that, if prior knowledge on gene grouping is indeed informative, the new methods perform better than the two standard penalized methods, yielding higher predictive accuracy and screening out more irrelevant genes.
Regularising Non-linear Models Using Feature Side-information
This paper proposes a framework that allows for the incorporation of the feature side- information during the learning of very general model families to improve the prediction performance and controls the structures of the learned models so that they reflect features similarities as these are defined on the basis of the side-information.
IPF-LASSO: Integrative L 1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data
This paper proposes a simple penalized regression method, called IPF-LASSO (Integrative LASSO with Penalty Factors), and is implemented in the R package ipflasso and illustrated through applications to two real-life cancer datasets.
Sparsity and smoothness via the fused lasso
The fused lasso is proposed, a generalization that is designed for problems with features that can be ordered in some meaningful way, and is especially useful when the number of features p is much greater than N, the sample size.
Practical Bayesian Optimization of Machine Learning Algorithms
This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.