A flexible coordinate descent method

@article{Fountoulakis2018AFC,
  title={A flexible coordinate descent method},
  author={Kimon Fountoulakis and Rachael Tappenden},
  journal={Computational Optimization and Applications},
  year={2018},
  volume={70},
  pages={351-394}
}
We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model… 

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization

The analysis generalizes, to the regularized case, Nesterov’s proposal for improving convergence of block-coordinate descent by sampling proportional to the blockwise Lipschitz constants, and improves the convergence rate in the convex case by weakening the dependency on the initial objective value.

SONIA: A Symmetric Blockwise Truncated Optimization Algorithm

Theoretical results are presented to confirm that the algorithm converges to a stationary point in both the strongly convex and nonconvex cases, and a stochastic variant of the algorithm is also presented, along with corresponding theoretical guarantees.

Greed is good : greedy optimization methods for large-scale structured problems

This dissertation shows that greedy coordinate descent and Kaczmarz methods have efficient implementations and can be faster than their randomized counterparts for certain common problem structures in machine learning, and shows linear convergence for greedy (block) coordinate descent methods under a revived relaxation of strong convexity from 1963.

Newton-Laplace Updates for Block Coordinate Descent

  • Computer Science
  • 2019
Nutini et al. show that when the chosen block’s sparsity pattern has a tree structure, “message-passing” algorithms can be used to solve the system in linear time and exploit the width of the Hessian's computation graph to speed up the Newton update.

Fast and Safe: Accelerated Gradient Methods With Optimality Certificates And Underestimate Sequences

This work introduces the concept of an Underestimate Sequence (UES), which is motivated by Nesterov’s estimate sequence, and proposes several first order methods for minimizing strongly convex functions in both the smooth and composite cases.

Globalized inexact proximal Newton-type methods for nonconvex composite functions

This work presents a globalized proximal Newton-type method which allows the smooth term to be nonconvex, and some numerical results indicate that this method is very promising also from a practical point of view.

Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

This paper proposes new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule and considers optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions.

Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

This paper proposes new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule and considers optimal active manifold identification, which leads to bounds on the “active-set complexity” of BCD methods and leads to superlinear convergence for certain problems with sparse solutions.

Graphical Newton for Huge-Block Coordinate Descent on Sparse Graphs

This paper shows how to use message-passing to compute the Newton step in O(|b|) when the block has a forest-structured dependency, allowing us to update huge blocks for sparse problems, resulting in significant numerical improvements over existing approaches.

Second order semi-smooth Proximal Newton methods in Hilbert spaces

We develop a globalized Proximal Newton method for composite and possibly non-convex minimization problems in Hilbert spaces. Additionally, we impose less restrictive assumptions on the composite

References

SHOWING 1-10 OF 56 REFERENCES

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

If the smooth part of the objective function has Lipschitz continuous gradient, then it is proved that the random coordinate descent method obtains an ϵ-optimal solution in $\mathcal{O}(n^{2}/\epsilon)$ iterations, where n is the number of blocks.

Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization

A block-coordinate gradient descent method is proposed for solving the problem of minimizing the weighted sum of a smooth function f and a convex function P of n real variables subject to m linear equality constraints, with the coordinate block chosen by a Gauss-Southwell-q rule based on sufficient predicted descent.

An Inexact Successive Quadratic Approximation Method for Convex L-1 Regularized Optimization

A Newton-like method for the minimization of an objective function that is the sum of a smooth convex function and an l-1 regularization term is studied, and inexactness conditions that guarantee global convergence and that can be used to control the local rate of convergence of the iteration are given.

On the convergence of inexact block coordinate descent methods for constrained optimization

A coordinate gradient descent method for nonsmooth separable minimization

A (block) coordinate gradient descent method for solving this class of nonsmooth separable problems and establishes global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method.

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

This work proposes an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function and investigates the convergence of this parallel B CD method for both randomized and cyclic variable selection rules.

An inexact successive quadratic approximation method for L-1 regularized optimization

The inexactness conditions are based on a semi-smooth function that represents a (continuous) measure of the optimality conditions of the problem, and that embodies the soft-thresholding iteration.

Accelerated, Parallel, and Proximal Coordinate Descent

A new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only, which can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent.

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

A randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function is developed and it is proved that it obtains an accurate solution with probability at least 1-\rho in at most O(n/\varepsilon) iterations, thus achieving first true iteration complexity bounds.

Parallel coordinate descent methods for big data optimization

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex
...