# From safe screening rules to working sets for faster Lasso-type solvers

@article{Massias2017FromSS, title={From safe screening rules to working sets for faster Lasso-type solvers}, author={Mathurin Massias and Alexandre Gramfort and Joseph Salmon}, journal={ArXiv}, year={2017}, volume={abs/1703.07285} }

Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Working set methods therefore involve two nested iterations: the outer loop…

## 20 Citations

### Celer: a Fast Solver for the Lasso with Dual Extrapolation

- Computer ScienceICML
- 2018

This work proposes an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points, which enables a tighter control of optimality as used in stopping criterion, as well as better screening performance of Gap Safe rules.

### Safe optimization algorithms for variable selection and hyperparameter tuning

- Computer Science
- 2018

This work proposes a unified framework for identifying important structures in these convex optimization problems and introduces the "Gap Safe Screening Rules", a recently introduced technique to ignore some variables during the optimization process by benefiting from the expected sparsity of the solutions.

### Dual Extrapolation for Sparse Generalized Linear Models

- Computer ScienceArXiv
- 2019

It is shown that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent.

### Exploiting regularity in sparse Generalized Linear Models

- Computer Science
- 2019

It is shown that the dual iterates of a GLM exhibit aVector AutoRegressive (VAR) behavior after sign identifi-cation, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent.

### Provably Convergent Working Set Algorithm for Non-Convex Regularized Regression

- Computer ScienceArXiv
- 2020

Theoretical guarantees derive from a lower bound of the objective function decrease between two inner solver iterations and shows the convergence to a stationary point of the full problem, and experimental results demonstrate high computational gain when using the working set strategy compared to the fullproblem solver for both block-coordinate descent or a proximal gradient solver.

### The Strong Screening Rule for SLOPE

- Computer ScienceNeurIPS
- 2020

A screening rule for SLOPE is developed by examining its subdifferential and it is shown that this rule is a generalization of the strong rule for the lasso, which means that it may discard predictors erroneously.

### Expanding boundaries of Gap Safe screening

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This work extends the existing Gap Safe screening framework by relaxing the global strong-concavity assumption on the dual cost function and exploiting local regularity properties, that is, strong concavity on well-chosen subsets of the domain.

### Greed is good : greedy optimization methods for large-scale structured problems

- Computer Science
- 2018

This dissertation shows that greedy coordinate descent and Kaczmarz methods have efficient implementations and can be faster than their randomized counterparts for certain common problem structures in machine learning, and shows linear convergence for greedy (block) coordinate descent methods under a revived relaxation of strong convexity from 1963.

### A Fast, Principled Working Set Algorithm for Exploiting Piecewise Linear Structure in Convex Problems

- Computer ScienceArXiv
- 2018

The theory relates subproblem size and stopping criteria to the amount of progress during each iteration of BlitzWS, a working set algorithm with useful theoretical guarantees that applies to many convex problems, including training L1-regularized models and support vector machines.

### Stable Safe Screening and Structured Dictionaries for Faster $\ell _{1}$ Regularization

- Computer ScienceIEEE Transactions on Signal Processing
- 2019

A new family of screening tests is introduced, termed stable screening, which can cope with approximation errors on the dictionary atoms while keeping the safety of the test (i.e., zero risk of rejecting atoms belonging to the solution support).

## 42 References

### Mind the duality gap: safer rules for the Lasso

- Computer ScienceICML
- 2015

New versions of the so-called $\textit{safe rules}$ for the Lasso are proposed, based on duality gap considerations, that create safe test regions whose diameters converge to zero, provided that one relies on a converging solver.

### Gap Safe screening rules for sparsity enforcing penalties

- Computer ScienceJ. Mach. Learn. Res.
- 2017

The proposed Gap Safe rules, so called because they rely on duality gap computation, can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods.

### GAP Safe screening rules for sparse multi-task and multi-class models

- Computer ScienceNIPS
- 2015

New safe rules for generalized linear models regularized with l1 and l1/ l2 norms are derived, based on duality gap computations and spherical safe regions whose diameters converge to zero, to discard safely more variables for low regularization parameters.

### Active Set Algorithms for the LASSO

- Computer Science
- 2011

This thesis disserts on the computation of the Least Absolute Shrinkage and Selection Operator (LASSO) and derivate problems, in regression analysis, and examines how three algorithms (active set, homotopy, and coordinate descent) can handle some limit cases, and can be applied to extended problems.

### Strong rules for discarding predictors in lasso‐type problems

- Computer ScienceJournal of the Royal Statistical Society. Series B, Statistical methodology
- 2012

This work proposes strong rules for discarding predictors in lasso regression and related problems, that are very simple and yet screen out far more predictors than the SAFE rules, and derives conditions under which they are foolproof.

### Screening Tests for Lasso Problems

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2017

Using a geometrically intuitive framework, this paper provides basic insights for understanding useful lasso screening tests and their limitations, and provides illustrative numerical studies on several datasets.

### Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization

- Computer ScienceICML
- 2015

BLITZ is a fast working set algorithm accompanied by useful guarantees that outperforms existing solvers in sequential, limited-memory, and distributed settings and is not specific to l1-regularized learning, making the algorithm relevant to many applications involving sparsity or constraints.

### Coordinate descent algorithms for lasso penalized regression

- Computer Science
- 2008

This paper tests two exceptionally fast algorithms for estimating regression coefficients with a lasso penalty and proves that a greedy form of the l 2 algorithm converges to the minimum value of the objective function.

### A new approach to variable selection in least squares problems

- Mathematics, Computer Science
- 2000

A compact descent method for solving the constrained problem for a particular value of κ is formulated, and a homotopy method, in which the constraint bound κ becomes the Homotopy parameter, is developed to completely describe the possible selection regimes.

### The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms

- Computer ScienceICML '08
- 2008

Conditions for the uniqueness of Group-Lasso solutions are formulated which lead to an easily implementable test procedure that allows us to identify all potentially active groups and derive an efficient algorithm that can deal with input dimensions in the millions and can approximate the solution path efficiently.