Proximal Gradient Methods with Adaptive Subspace Sampling

@article{Grishchenko2021ProximalGM,
  title={Proximal Gradient Methods with Adaptive Subspace Sampling},
  author={Dmitry Grishchenko and Franck Iutzeler and J{\'e}r{\^o}me Malick},
  journal={Math. Oper. Res.},
  year={2021},
  volume={46},
  pages={1303-1323}
}
Many applications in machine learning or signal processing involve nonsmooth optimization problems. This nonsmoothness brings a low-dimensional structure to the optimal solutions. In this paper, we propose a randomized proximal gradient method harnessing this underlying structure. We introduce two key components: (i) a random subspace proximal gradient algorithm; and (ii) an identification-based sampling of the subspaces. Their interplay brings a significant performance improvement on typical… 

Figures and Tables from this paper

Nonsmoothness in Machine Learning: Specific Structure, Proximal Identification, and Applications

This paper presents the specific structure of nonsmooth optimization problems appearing in machine learning and illustrates how to leverage this structure in practice, for compression, acceleration, or dimension reduction.

Randomized subspace regularized Newton method for unconstrained non-convex optimization

This paper shows that their method has global convergence under appropriate assumptions and its convergence rate is the same as that of the full regularized Newton method, and can obtain a local linear convergence rate that is the best the authors can hope when using random subspace.

Random coordinate descent methods for non-separable composite optimization

This paper designs two random coordinate descent methods for composite optimization problems having the objective function formed as a sum of two terms, one has Lipschitz continuous gradient and another is differentiable but non-separable.

Global optimization using random embeddings

A random-subspace algorithmic framework for global optimization of Lipschitz-continuous objectives, and it is shown numerically that this variant of X-REGO efficiently finds both the effective dimension and an approximate global minimizer of the original problem.

Randomised subspace methods for non-convex optimization, with applications to nonlinear least-squares

We propose a general random subspace framework for unconstrained nonconvex optimization problems that requires a weak probabilistic assumption on the subspace gradient, which we show to be satisfied

Efficiency of stochastic coordinate proximal gradient methods on nonseparable composite optimization

This work is the first proposing a pure stochastic coordinate descent algorithm which is supported by global efficiency estimates for general nonseparable composite optimization problems and proves high-probability bounds on the number of iterations before a given optimality is achieved.

References

SHOWING 1-10 OF 50 REFERENCES

Proximal Splitting Methods in Signal Processing

The basic properties of proximity operators which are relevant to signal processing and optimization methods based on these operators are reviewed and proximal splitting methods are shown to capture and extend several well-known algorithms in a unifying framework.

Optimization with Sparsity-Inducing Penalties

This monograph covers proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provides an extensive set of experiments to compare various algorithms from a computational point of view.

Are we there yet? Manifold identification of gradient-related proximal methods

This work gives an iteration bound, characterized in terms of their variable convergence rate and a problem-dependent constant that indicates problem degeneracy, which gives intuition as to when lower active set complexity may be expected in practice.

Accelerated Block-coordinate Relaxation for Regularized Optimization

A block-coordinate relaxation approach with proximal linearized subproblems yields convergence to critical points, while identification of the optimal manifold allows acceleration techniques to be applied on a reduced space.

Faster Coordinate Descent via Adaptive Importance Sampling

This work theoretically characterize the performance of the selection rules and demonstrate improvements over the state-of-the-art, and extends the theory and algorithms to general convex objectives.

Model Selection with Low Complexity Priors

This paper presents a unified sharp analysis of exact and robust recovery of the low-dimensional subspace model associated to the object to recover from partial measurements and shows that the set of partly smooth functions relative to a linear manifold is closed under addition and pre-composition by a linear operator.

Stochastic Optimization with Importance Sampling for Regularized Loss Minimization

Stochastic optimization, including prox-SMD and prox-SDCA, is studied with importance sampling, which improves the convergence rate by reducing the stochastic variance, and theoretically analyze and empirically validate their effectiveness.

Adaptive Sampling Probabilities for Non-Smooth Optimization

This work employs a bandit optimization procedure that "learns" probabilities for sampling coordinates or examples in (non-smooth) optimization problems, allowing it to guarantee performance close to that of the optimal stationary sampling distribution.

Proximal Thresholding Algorithm for Minimization over Orthonormal Bases

This work proposes a versatile convex variational formulation for optimization over orthonormal bases that covers a wide range of problems, and establishes the strong convergence of a proximal thresholding algorithm to solve it.

Accelerated Coordinate Descent with Adaptive Coordinate Frequencies

This work proposes an extension of the CD algorithm, called the adaptive coordinate frequencies (ACF) method, which does not treat all coordinates equally, in that it does not pick all coordinates equal often for optimization.