A flexible coordinate descent method

@article{Fountoulakis2018AFC,
  title={A flexible coordinate descent method},
  author={Kimon Fountoulakis and Rachael Tappenden},
  journal={Computational Optimization and Applications},
  year={2018},
  volume={70},
  pages={351-394}
}
We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model… 
Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization
TLDR
The analysis generalizes, to the regularized case, Nesterov’s proposal for improving convergence of block-coordinate descent by sampling proportional to the blockwise Lipschitz constants, and improves the convergence rate in the convex case by weakening the dependency on the initial objective value.
SONIA: A Symmetric Blockwise Truncated Optimization Algorithm
TLDR
Theoretical results are presented to confirm that the algorithm converges to a stationary point in both the strongly convex and nonconvex cases, and a stochastic variant of the algorithm is also presented, along with corresponding theoretical guarantees.
Greed is good : greedy optimization methods for large-scale structured problems
TLDR
This dissertation shows that greedy coordinate descent and Kaczmarz methods have efficient implementations and can be faster than their randomized counterparts for certain common problem structures in machine learning, and shows linear convergence for greedy (block) coordinate descent methods under a revived relaxation of strong convexity from 1963.
Fast and Safe: Accelerated Gradient Methods With Optimality Certificates And Underestimate Sequences
TLDR
This work introduces the concept of an Underestimate Sequence (UES), which is motivated by Nesterov’s estimate sequence, and proposes several first order methods for minimizing strongly convex functions in both the smooth and composite cases.
Globalized inexact proximal Newton-type methods for nonconvex composite functions
TLDR
This work presents a globalized proximal Newton-type method which allows the smooth term to be nonconvex, and some numerical results indicate that this method is very promising also from a practical point of view.
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
TLDR
This paper proposes new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule and considers optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions.
Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
TLDR
This paper proposes new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule and considers optimal active manifold identification, which leads to bounds on the “active-set complexity” of BCD methods and leads to superlinear convergence for certain problems with sparse solutions.
Graphical Newton for Huge-Block Coordinate Descent on Sparse Graphs
TLDR
This paper shows how to use message-passing to compute the Newton step in O(|b|) when the block has a forest-structured dependency, allowing us to update huge blocks for sparse problems, resulting in significant numerical improvements over existing approaches.
Second order semi-smooth Proximal Newton methods in Hilbert spaces
We develop a globalized Proximal Newton method for composite and possibly non-convex minimization problems in Hilbert spaces. Additionally, we impose less restrictive assumptions on the composite
Accelerating block coordinate descent methods with identification strategies
TLDR
An identification function tailored for bound-constrained composite minimization together with an associated version of the BCDM that is also globally convergent is devised, which gives rise to an efficient practical strategy for Lasso and $$\ell _1$$ℓ1-regularized logistic regression.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints
TLDR
If the smooth part of the objective function has Lipschitz continuous gradient, then it is proved that the random coordinate descent method obtains an ϵ-optimal solution in $\mathcal{O}(n^{2}/\epsilon)$ iterations, where n is the number of blocks.
Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization
TLDR
A block-coordinate gradient descent method is proposed for solving the problem of minimizing the weighted sum of a smooth function f and a convex function P of n real variables subject to m linear equality constraints, with the coordinate block chosen by a Gauss-Southwell-q rule based on sufficient predicted descent.
An Inexact Successive Quadratic Approximation Method for Convex L-1 Regularized Optimization
TLDR
A Newton-like method for the minimization of an objective function that is the sum of a smooth convex function and an l-1 regularization term is studied, and inexactness conditions that guarantee global convergence and that can be used to control the local rate of convergence of the iteration are given.
A coordinate gradient descent method for nonsmooth separable minimization
TLDR
A (block) coordinate gradient descent method for solving this class of nonsmooth separable problems and establishes global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method.
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
TLDR
This work proposes an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function and investigates the convergence of this parallel B CD method for both randomized and cyclic variable selection rules.
An inexact successive quadratic approximation method for L-1 regularized optimization
TLDR
The inexactness conditions are based on a semi-smooth function that represents a (continuous) measure of the optimality conditions of the problem, and that embodies the soft-thresholding iteration.
Accelerated, Parallel, and Proximal Coordinate Descent
TLDR
A new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only, which can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent.
Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
TLDR
A randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function is developed and it is proved that it obtains an accurate solution with probability at least 1-\rho in at most O(n/\varepsilon) iterations, thus achieving first true iteration complexity bounds.
Parallel coordinate descent methods for big data optimization
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex
...
...