Ridge Regularization: An Essential Concept in Data Science

@article{Hastie2020RidgeRA,
  title={Ridge Regularization: An Essential Concept in Data Science},
  author={Trevor J. Hastie},
  journal={Technometrics},
  year={2020},
  volume={62},
  pages={426 - 433}
}
  • T. Hastie
  • Published 30 May 2020
  • Computer Science, Mathematics
  • Technometrics
Abstract Ridge or more formally regularization shows up in many areas of statistics and machine learning. It is one of those essential devices that any good data scientist needs to master for their craft. In this brief ridge fest, I have collected together some of the magic and beauty of ridge that my colleagues and I have encountered over the past 40 years in applied statistics. 
Comment: Ridge Regression—Still Inspiring After 50 Years
  • H. Zou
  • Computer Science, Mathematics
    Technometrics
  • 2020
TLDR
Comments will focus on two new results related to ridge regularization: response guided principal component regression and leave-one-out analysis in kernel machines.
Comment: Ridge Regression and Regularization of Large Matrices
We view ridge regression through the lens of eigenvalue shrinkage, and consider its influence on two modern problems in high-dimensional statistical inference: covariance estimation and community d...
Can’t Ridge Regression Perform Variable Selection?
  • Yichao Wu
  • Computer Science, Mathematics
    Technometrics
  • 2021
TLDR
A new variable selection method based on an individually penalized ridge regression, a slightly generalized version of the ridge regression is proposed, which is shown to perform competitively based on simulation and a real data example.
InfoGram and Admissible Machine Learning
  • Subhadeep Mukhopadhyay
  • Computer Science, Mathematics
    ArXiv
  • 2021
TLDR
A new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy are introduced.
DeepShadows: Separating Low Surface Brightness Galaxies from Artifacts using Deep Learning
TLDR
This work investigates the use of convolutional neural networks (CNNs) for the problem of separating LSBGs from artifacts in survey images and demonstrates that CNNs offer a very promising path in the quest to study the low-surface-brightness universe.
Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets
The intensification of extreme precipitation under anthropogenic forcing is robustly projected by global climate models, but highly challenging to detect in the observational record. Large internal
Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets
TLDR
A physically interpretable anthropogenic signal that is detectable in all global observational datasets is found that is robustly projected by global climate models and capable of identifying the time evolution of the spatial patterns.
Using Machine Learning to Understand Veterans' Receipt of Loans in the Paycheck Protection Program
This paper provides the first quantitative investigation of the receipt of funds from the Paycheck Protection Program (PPP) among Veterans between April and June. We find that Veterans received 3.5%
Semiparametric Portfolios: Improving Portfolio Performance by Exploiting Non-Linearities in Firm Characteristics
We present a semiparametric portfolio optimization method in which portfolio weights are parameterized as a non-linear function of firm characteristics. This approach generalizes the linear
A tutorial on individualized treatment effect prediction from randomized trials with a binary endpoint.
TLDR
The causal structure of individualized treatment effect is laid out in terms of potential outcomes and the required assumptions that underlie a causal interpretation of its prediction are described, including logistic regression-based methods that are both well-known and naturally provide the required probabilistic estimates.
...
1
2
...

References

SHOWING 1-10 OF 39 REFERENCES
Regularization and variable selection via the elastic net
Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a
Statistical Learning with Sparsity: The Lasso and Generalizations
TLDR
Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.
Group lasso with overlap and graph lasso
TLDR
A new penalty function is proposed which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators and is studied theoretical properties of the estimator, and illustrated on simulated and breast cancer gene expression data.
Efficient quadratic regularization for expression arrays.
TLDR
This article exposes a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks, and shows that dramatic computational savings are possible over naive implementations.
Reconciling modern machine learning practice and the bias-variance trade-off
TLDR
This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.
Computer Age Statistical Inference: Algorithms, Evidence, and Data Science
TLDR
This book takes an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s, with speculation on the future direction of statistics and data science.
Regression Shrinkage and Selection via the Lasso
SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
TLDR
This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
Ridge Regression: Biased Estimation for Nonorthogonal Problems
TLDR
The ridge trace is introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality, and how to augment X′X to obtain biased estimates with smaller mean square error.
Dropout Training as Adaptive Regularization
TLDR
By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
...
1
2
3
4
...