• Corpus ID: 235592902

# Asynchronous Stochastic Optimization Robust to Arbitrary Delays

@inproceedings{Cohen2021AsynchronousSO,
title={Asynchronous Stochastic Optimization Robust to Arbitrary Delays},
author={Alon Cohen and Amit Daniely and Yoel Drori and Tomer Koren and Mariano Schain},
booktitle={NeurIPS},
year={2021}
}
• Published in NeurIPS 22 June 2021
• Computer Science
We consider stochastic optimization with delayed gradients where, at each time step 𝑡 , the algorithm makes an update using a stale stochastic gradient from step 𝑡 − 𝑑 𝑡 for some arbitrary delay 𝑑 𝑡 . This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization…
4 Citations

## Figures and Tables from this paper

### Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

• Computer Science
ArXiv
• 2022
The asynchronous stochastic gradient descent algorithm for distributed training over n workers which have varying computation and communication frequency over time is studied and it is shown for the first time that asynchronous SGD is always faster than mini-batch SGD.

### Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

• Computer Science
ArXiv
• 2022
This work introduces a novel recursion based on “virtual iterates” and delay-adaptive stepsizes, which allow it to derive state-of-theart guarantees for both convex and non-convex objectives.

### Distributed Distributionally Robust Optimization with Non-Convex Objectives

• Computer Science
• 2022
An asynchronous distributed algorithm, named ASPIRE, is proposed with the EASE method to tackle the distributed distributionally robust optimization (DDRO) problem and can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.

### Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

• Computer Science
ArXiv
• 2022
This paper presents the first algorithms that achieve near-optimal √ K +D regret, where K is the number of episodes and D = ∑K k=1 d k is the total delay, significantly improving upon the best known regret bound of (K +D).

## References

SHOWING 1-10 OF 31 REFERENCES

### Distributed delayed stochastic optimization

• Computer Science
2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
• 2012
This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays.

### A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

• Computer Science
ALT
• 2020
Tight finite-time convergence bounds are provided for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago and the results indicate that the performance of gradient descent with delays is competitive with synchronous approaches such as mini-batching.

### Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

• Computer Science
SIAM J. Optim.
• 2017
Using the perturbed iterate framework, this work provides new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates.

### An asynchronous mini-batch algorithm for regularized stochastic optimization

• Computer Science
2015 54th IEEE Conference on Decision and Control (CDC)
• 2015
This work proposes an asynchronous mini-batch algorithm for regularized stochastic optimization problems that eliminates idle waiting and allows workers to run at their maximal update rates and enjoys near-linear speedup if the number of workers is O(1/√ϵ).

### A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

• Computer Science, Mathematics
ICML
• 2018
This paper introduces a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems, and presents its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings.

### The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

The results show that SGD is robust to compressed and/or delayed stochastic gradient updates and is in particular important for distributed parallel implementations, where asynchronous and communication efficient methods are the key to achieve linear speedups for optimization with multiple devices.

### Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care

• Computer Science
NIPS
• 2015
We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under

### Randomized Smoothing for Stochastic Optimization

• Computer Science, Mathematics
SIAM J. Optim.
• 2012
We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we

### Improved asynchronous parallel optimization analysis for stochastic incremental methods

• Computer Science
J. Mach. Learn. Res.
• 2018
It is proved that ASAGA and KROMAGNON can obtain a theoretical linear speedup on multi-core systems even without sparsity assumptions, and the overlap constant is investigated, an ill-understood but central quantity for the theoretical analysis of asynchronous parallel algorithms.

### An optimal method for stochastic composite optimization

The accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP is introduced, and it is shown that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO.