• Corpus ID: 235592902

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

@inproceedings{Cohen2021AsynchronousSO,
  title={Asynchronous Stochastic Optimization Robust to Arbitrary Delays},
  author={Alon Cohen and Amit Daniely and Yoel Drori and Tomer Koren and Mariano Schain},
  booktitle={NeurIPS},
  year={2021}
}
We consider stochastic optimization with delayed gradients where, at each time step 𝑑 , the algorithm makes an update using a stale stochastic gradient from step 𝑑 βˆ’ 𝑑 𝑑 for some arbitrary delay 𝑑 𝑑 . This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization… 

Figures and Tables from this paper

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

The asynchronous stochastic gradient descent algorithm for distributed training over n workers which have varying computation and communication frequency over time is studied and it is shown for the first time that asynchronous SGD is always faster than mini-batch SGD.

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

This work introduces a novel recursion based on β€œvirtual iterates” and delay-adaptive stepsizes, which allow it to derive state-of-theart guarantees for both convex and non-convex objectives.

Distributed Distributionally Robust Optimization with Non-Convex Objectives

An asynchronous distributed algorithm, named ASPIRE, is proposed with the EASE method to tackle the distributed distributionally robust optimization (DDRO) problem and can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

This paper presents the first algorithms that achieve near-optimal √ K +D regret, where K is the number of episodes and D = βˆ‘K k=1 d k is the total delay, significantly improving upon the best known regret bound of (K +D).

References

SHOWING 1-10 OF 31 REFERENCES

Distributed delayed stochastic optimization

This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays.

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

Tight finite-time convergence bounds are provided for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago and the results indicate that the performance of gradient descent with delays is competitive with synchronous approaches such as mini-batching.

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Using the perturbed iterate framework, this work provides new analyses of the Hogwild! algorithm and asynchronous stochastic coordinate descent, that are simpler than earlier analyses, remove many assumptions of previous models, and in some cases yield improved upper bounds on the convergence rates.

An asynchronous mini-batch algorithm for regularized stochastic optimization

This work proposes an asynchronous mini-batch algorithm for regularized stochastic optimization problems that eliminates idle waiting and allows workers to run at their maximal update rates and enjoys near-linear speedup if the number of workers is O(1/√ϡ).

A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates

This paper introduces a simple stochastic variance reduced algorithm (MiG), which enjoys the best-known convergence rates for both strongly convex and non-strongly convex problems, and presents its efficient sparse and asynchronous variants, and theoretically analyze its convergence rates in these settings.

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

The results show that SGD is robust to compressed and/or delayed stochastic gradient updates and is in particular important for distributed parallel implementations, where asynchronous and communication efficient methods are the key to achieve linear speedups for optimization with multiple devices.

Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care

We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under…

Randomized Smoothing for Stochastic Optimization

We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we…

Improved asynchronous parallel optimization analysis for stochastic incremental methods

It is proved that ASAGA and KROMAGNON can obtain a theoretical linear speedup on multi-core systems even without sparsity assumptions, and the overlap constant is investigated, an ill-understood but central quantity for the theoretical analysis of asynchronous parallel algorithms.

An optimal method for stochastic composite optimization

The accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP is introduced, and it is shown that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO.