# Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

@inproceedings{Woodworth2018GraphOM, title={Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization}, author={Blake E. Woodworth and Jialei Wang and H. B. McMahan and Nathan Srebro}, booktitle={NeurIPS}, year={2018} }

We suggest a general oracle-based framework that captures parallel stochastic optimization in different parallelization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds to study several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where…

## Tables and Topics from this paper

## 74 Citations

Lower Bounds for Parallel and Randomized Convex Optimization

- Computer Science, MathematicsCOLT
- 2019

We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation. We show that the answer…

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

- Computer Science, MathematicsCOLT
- 2021

This work presents a novel lower bound with a matching upper bound that establishes an optimal algorithm in the intermittent communication setting, where M machines work in parallel over the course of R rounds of communication to optimize the objective.

Parallelization does not Accelerate Convex Optimization: Adaptivity Lower Bounds for Non-smooth Convex Minimization

- Computer Science, MathematicsArXiv
- 2018

A tight lower bound is given that shows that even when $\texttt{poly}(n)$ queries can be executed in parallel, there is no randomized algorithm with $\tilde{o}( n^{1/3})$ rounds of adaptivity that has convergence rate that is better than those achievable with a one-query-per-round algorithm.

An Accelerated Second-Order Method for Distributed Stochastic Optimization

- Mathematics
- 2021

We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate…

Towards Optimal Convergence Rate in Decentralized Stochastic Training

- Computer ScienceArXiv
- 2020

A tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting is provided and DeFacto is proposed, a class of algorithms that converge at the optimal rate without additional theoretical assumptions.

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

- Computer Science, Mathematics
- 2019

By using mini-batching technique, it is shown that the proposed methods with stochastic oracle can be additionally parallelized at each node, which can be applied to many data science problems and inverse problems.

Improved Communication Lower Bounds for Distributed Optimisation

- Computer ScienceArXiv
- 2020

It is shown that $\Omega( Nd \log d / \varepsilon)$ bits in total need to be communicated between the machines to find an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i (x)$.

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

- Computer Science, MathematicsCOLT
- 2019

It is shown that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on Smoothness is necessary in the local stochastic oracle model.

A Stochastic Newton Algorithm for Distributed Convex Optimization

- Computer Science, MathematicsArXiv
- 2021

It is shown that this stochastic Newton algorithm can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance, by proving convergence guarantees for quasi-self-concordant objectives (e.g., logistic regression), alongside empirical evidence.

Never Go Full Batch (in Stochastic Convex Optimization)

- Computer Science, MathematicsArXiv
- 2021

A new separation result is provided showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within ε after $ (1/ε2) iterations, full-batch methods either need at least Ω( 1/ε4) iterations or exhibit a dimension-dependent sample complexity.

## References

SHOWING 1-10 OF 30 REFERENCES

Tight Complexity Bounds for Optimizing Composite Objectives

- Computer Science, MathematicsNIPS
- 2016

For smooth functions, it is shown that accelerated gradient descent and an accelerated variant of SVRG are optimal in the deterministic and randomized settings respectively, and that a gradient oracle is sufficient for the optimal rate.

Communication Complexity of Distributed Convex Learning and Optimization

- Computer Science, MathematicsNIPS
- 2015

The results indicate that without similarity between the local objective functions (due to statistical data similarity or otherwise) many communication rounds may be required, even if the machines have unbounded computational power.

Minimax Bounds on Stochastic Batched Convex Optimization

- Computer ScienceCOLT
- 2018

Lower and upper bounds on the performance of such batched convex optimization algorithms in zeroth and first-order settings for Lipschitz convex and smooth strongly convex functions are provided.

Information-theoretic lower bounds for distributed statistical estimation with communication constraints

- Computer Science, MathematicsNIPS
- 2013

Lower bounds on minimax risks for distributed statistical estimation under a communication budget are established for several problems, including various types of location models, as well as for parameter estimation in regression models.

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

- Computer ScienceCOLT
- 2017

This work presents and analyzes an approach for distributed stochastic optimization which is statistically optimal and achieves near-linear speedups (up to logarithmic factors) and provides a novel analysis for such a minibatch-prox procedure which achieves the statistical optimal rate regardless of minibatches size and smoothness, thus significantly improving on prior work.

AdaDelay: Delay Adaptive Distributed Stochastic Optimization

- Computer ScienceAISTATS
- 2016

DStochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients are developed, with noticeable improvements for large-scale real datasets with billions of examples and features.

Distributed delayed stochastic optimization

- Computer Science, Mathematics2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
- 2012

This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays.

An asynchronous mini-batch algorithm for regularized stochastic optimization

- Computer Science, Mathematics2015 54th IEEE Conference on Decision and Control (CDC)
- 2015

This work proposes an asynchronous mini-batch algorithm for regularized stochastic optimization problems that eliminates idle waiting and allows workers to run at their maximal update rates and enjoys near-linear speedup if the number of workers is O(1/√ϵ).

Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning

- Computer ScienceNIPS
- 2014

This work analyzes new online gradient descent algorithms for distributed systems with large delays between gradient computations and the corresponding updates and gives an impractical algorithm that achieves a regret bound that precisely quantifies the impact of the delays.

Lower Bound for Randomized First Order Convex Optimization

- Mathematics
- 2017

We provide an explicit construction and direct proof for the lower bound on the number of first order oracle accesses required for a randomized algorithm to minimize a convex Lipschitz function.