Accelerated, Parallel, and Proximal Coordinate Descent

Abstract

We propose a new stochastic coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ω̄L̄R/(k+2), where k is the iteration counter, ω̄ is an average degree of separability of the loss function, L̄ is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is considered to be the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree of separability, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel stochastic coordinate descent algorithms based on the concept of ESO.

DOI: 10.1137/130949993

Extracted Key Phrases

020402014201520162017
Citations per Year

110 Citations

Semantic Scholar estimates that this publication has 110 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Fercoq2015AcceleratedPA, title={Accelerated, Parallel, and Proximal Coordinate Descent}, author={Olivier Fercoq and Peter Richt{\'a}rik}, journal={SIAM Journal on Optimization}, year={2015}, volume={25}, pages={1997-2023} }