# Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server

@article{Aytekin2016AnalysisAI, title={Analysis and Implementation of an Asynchronous Optimization Algorithm for the Parameter Server}, author={Arda Aytekin and Hamid Reza Feyzmahdavian and Mikael Johansson}, journal={ArXiv}, year={2016}, volume={abs/1610.05507} }

This paper presents an asynchronous incremental aggregated gradient algorithm and its implementation in a parameter server framework for solving regularized optimization problems. The algorithm can handle both general convex (possibly non-smooth) regularizers and general convex constraints. When the empirical data loss is strongly convex, we establish linear convergence rate, give explicit expressions for step-size choices that guarantee convergence to the optimum, and bound the associated…

## 38 Citations

Optimal convergence rates of totally asynchronous optimization

- Computer Science
- 2022

This paper derives explicit convergence rates for the proximal incremental aggregated gradient (PIAG) and the asynchronous block-coordinate descent (Async-BCD) methods under a specific model of total asynchrony, and shows that the derived rates are order-optimal.

A Distributed Flexible Delay-Tolerant Proximal Gradient Algorithm

- Computer ScienceSIAM J. Optim.
- 2020

This work develops and analyzes an asynchronous algorithm for distributed convex optimization when the objective writes a sum of smooth functions, local to each worker, and a non-smooth function, and proves that the algorithm converges linearly in the strongly convex case, and provides guarantees of convergence for the non-strongly conveX case.

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees

- Computer Science, MathematicsArXiv
- 2021

The results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods, and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical understanding.

A Provably Communication-Efficient Asynchronous Distributed Inference Method for Convex and Nonconvex Problems

- Computer Science, MathematicsIEEE Transactions on Signal Processing
- 2020

It is proved that for nonconvex nonsmooth problems, the proposed algorithm converges to a stationary point with a sublinear rate over the number of communication rounds, coinciding with the best theoretical rate that can be achieved for this class of problems.

Distributed learning with compressed gradients

- Computer Science
- 2018

A unified analysis framework for distributed gradient methods operating with staled and compressed gradients is presented and non-asymptotic bounds on convergence rates and information exchange are derived for several optimization algorithms.

Distributed Deterministic Asynchronous Algorithms in Time-Varying Graphs Through Dykstra Splitting

- Mathematics, Computer ScienceSIAM J. Optim.
- 2019

This work considers the setting where each vertex of a graph has a function, and communications can only occur between vertices connected by an edge, and proposes a distributed version of Dykstra's algorithm to minimize the sum of these functions.

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

- Computer ScienceJ. Mach. Learn. Res.
- 2019

This paper works with the saddle-point formulation of large linear models with convex loss functions, and proposes a family of randomized primal-dual block coordinate algorithms that are especially suitable for asynchronous distributed implementation with parameter servers.

Advances in Asynchronous Parallel and Distributed Optimization

- Computer ScienceProceedings of the IEEE
- 2020

This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods,where each processor maintains a local copy ofThe analysis provides insights into how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.

Asynchronous Distributed Learning with Sparse Communications and Identification

- Computer ScienceArXiv
- 2018

An asynchronous optimization algorithm for distributed learning that efficiently reduces the communications between a master and working machines by randomly sparsifying the local updates, and identifies near-optimal sparsity patterns, so that all communications eventually become sparse.

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning

- Computer ScienceICML
- 2018

This work proposes and analyzes a flexible asynchronous optimization algorithm for solving nonsmooth learning problems and proves that the algorithm converges linearly with a fixed learning rate that does not depend on communication delays nor on the number of machines.

## References

SHOWING 1-10 OF 23 REFERENCES

On the Convergence Rate of Incremental Aggregated Gradient Algorithms

- Computer ScienceSIAM J. Optim.
- 2017

It is shown that this deterministic incremental aggregated gradient method has global linear convergence and the convergence rate is characterized, and an aggregated method with momentum is considered and its linear convergence is demonstrated.

An asynchronous mini-batch algorithm for regularized stochastic optimization

- Computer Science2015 54th IEEE Conference on Decision and Control (CDC)
- 2015

This work proposes an asynchronous mini-batch algorithm for regularized stochastic optimization problems that eliminates idle waiting and allows workers to run at their maximal update rates and enjoys near-linear speedup if the number of workers is O(1/√ϵ).

Distributed delayed stochastic optimization

- Computer Science2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
- 2012

This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays.

Parameter Server for Distributed Machine Learning

- Computer Science
- 2013

A parameter server framework to solve distributed machine learning problems and presents algorithms and theoretical analysis for challenging nonconvex and nonsmooth problems, and shows experimental results on real data with billions of parameters.

Optimal Distributed Online Prediction Using Mini-Batches

- Computer ScienceJ. Mach. Learn. Res.
- 2012

This work presents the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms that is asymptotically optimal for smooth convex loss functions and stochastic inputs and proves a regret bound for this method.

Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods

- Computer Science, MathematicsSIAM J. Optim.
- 2018

This paper is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order, and shows that the PiaG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small.

A Convergent Incremental Gradient Method with a Constant Step Size

- Mathematics, Computer ScienceSIAM J. Optim.
- 2007

An incremental aggregated gradient method for minimizing a sum of continuously differentiable functions and it is shown that the method visits infinitely often regions in which the gradient is small.

Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems

- Computer Science, MathematicsSIAM J. Optim.
- 2012

Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

- Computer ScienceSIAM J. Optim.
- 2015

This work proposes an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning, and presents convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error.

Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

- Computer ScienceNIPS
- 2011

This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking, and presents an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work.