Coordinate descent with arbitrary sampling II: expected separable overapproximation

@article{Qu2016CoordinateDW,
  title={Coordinate descent with arbitrary sampling II: expected separable overapproximation},
  author={Zheng Qu and Peter Richt{\'a}rik},
  journal={Optimization Methods and Software},
  year={2016},
  volume={31},
  pages={858 - 884}
}
The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depend on the notion of expected separable overapproximation (ESO). This refers to an inequality involving the objective function and the sampling, capturing in a compact way certain smoothness properties of the function in a random subspace spanned by the sampled coordinates. ESO inequalities were previously… 
Coordinate descent with arbitrary sampling I: algorithms and complexity†
TLDR
A complexity analysis of ALPHA is provided, from which it is deduced as a direct corollary complexity bounds for its many variants, all matching or improving best known bounds.
Title Coordinate descent with arbitrary sampling I : algorithms andcomplexity
TLDR
A complexity analysis of ALPHA is provided, from which it is deduced as a direct corollary complexity bounds for its many variants, all matching or improving best known bounds.
Ju n 20 15 Coordinate Descent with Arbitrary Sampling I : Algorithms and Complexity ∗
TLDR
A complexity analysis of ALPHA is provided, from which it is deduced as a direct corollary complexity bounds for its many variants, all matching or improving best known bounds.
Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling
TLDR
This work proposes and analyzes a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution.
On optimal probabilities in stochastic coordinate descent methods
TLDR
A new parallel coordinate descent method is proposed and analyzed, in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen using an arbitrary probability law, which is the first method of this type.
Nonconvex Variance Reduced Optimization with Arbitrary Sampling
TLDR
Surprisingly, this approach can in some regimes lead to superlinear speedup with respect to the minibatch size, which is not usually present in stochastic optimization.
Convergence Analysis of Block Coordinate Algorithms with Determinantal Sampling
TLDR
The convergence rate of the randomized Newton-like method for smooth and convex objectives, which uses random coordinate blocks of a Hessian-over-approximation matrix $\bM$ instead of the true Hessian, is analyzed and a fundamental new expectation formula for determinantal point processes is derived.
Global Convergence of Arbitrary-Block Gradient Methods for Generalized Polyak-{\L} ojasiewicz Functions
TLDR
The proportion function is introduced, which is further use to analyze all known (and many new) block-selection rules for block coordinate descent methods under a single framework and gives global convergence guarantees for a class of non-convex functions previously not considered in theory.
On Adaptive Cubic Regularized Newton's Methods for Convex Optimization via Random Sampling
TLDR
This paper proposes to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective of an unconstrained optimization model and develops both standard and accelerated adaptive cubic regularization approaches and provides theoretical guarantees on global iteration complexity.
Accelerating Adaptive Cubic Regularization of Newton's Method via Random Sampling
TLDR
This paper proposes to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective and develops accelerated adaptive cubic regularization approaches, which provide theoretical guarantees on global iteration complexity of O (cid:15) − 1 / 3 ) with high probability.
...
...

References

SHOWING 1-10 OF 48 REFERENCES
Coordinate descent with arbitrary sampling I: algorithms and complexity†
TLDR
A complexity analysis of ALPHA is provided, from which it is deduced as a direct corollary complexity bounds for its many variants, all matching or improving best known bounds.
Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling
TLDR
This work proposes and analyzes a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution.
Randomized Dual Coordinate Ascent with Arbitrary Sampling
TLDR
This work proposes and analyzes a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution, and generates efficient serial, parallel and distributed variants of the method.
Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization
We study the convergence properties of a (block) coordinate descent method applied to minimize a nondifferentiable (nonconvex) function f(x1, . . . , xN) with certain separability and regularity
On optimal probabilities in stochastic coordinate descent methods
TLDR
A new parallel coordinate descent method is proposed and analyzed, in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen using an arbitrary probability law, which is the first method of this type.
Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function
We develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth blockseparable convex function and prove that it obtains an ε-accurate solution with
Separable approximations and decomposition methods for the augmented Lagrangian
TLDR
An improved complexity bound for PCDM under strong convexity is proved, and it is shown that this bound is at least 8(L′/L̄)(ω−1)2 times better than the best known bound for DQAM.
A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints
TLDR
If the smooth part of the objective function has Lipschitz continuous gradient, then it is proved that the random coordinate descent method obtains an ϵ-optimal solution in $\mathcal{O}(n^{2}/\epsilon)$ iterations, where n is the number of blocks.
Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties
TLDR
An asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function added to a separable conveX function, achieves a linear convergence rate on functions that satisfy an optimal strong convexity property and a sublinear rate on general convex functions.
Parallel coordinate descent methods for big data optimization
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex
...
...