# ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

@inproceedings{Kovalev2021ADOMAD, title={ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks}, author={D. Kovalev and Egor Shulgin and Peter Richt{\'a}rik and Alexander Rogozin and Alexander V. Gasnikov}, booktitle={ICML}, year={2021} }

We propose ADOM – an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the same as that of accelerated Nesterov gradient method (Nesterov, 2003). To the best of our knowledge, only the algorithm of Rogozin et al. (2019…

## Figures and Tables from this paper

## 10 Citations

STRONGLY CONVEX DECENTRALIZED OPTIMIZATION OVER TIME-VARYING NETWORKS

- Computer Science
- 2021

This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

- Computer ScienceNeurIPS
- 2021

This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.

Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks∗

- Computer Science
- 2021

This work study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network, and obtains lower complexity bounds for algorithms in this setup and develops optimal methods which meet the lower bounds.

Near-Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks

- Computer ScienceOptimization and Applications
- 2021

This work study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network, and obtains lower complexity bounds for algorithms in this setup and develops near-optimal methods which meet the lower bounds.

Achieving Efficient Distributed Machine Learning Using a Novel Non-Linear Class of Aggregation Functions

- Computer ScienceArXiv
- 2022

This paper proposes a novel non-linear class of model aggregation functions to achieve efficient DML over time-varying networks and rigorously proves convergence properties of the WPM, a weighted power-p mean where p is a positive integer.

Optimal Gradient Tracking for Decentralized Optimization

- Computer Science
- 2021

Optimal Gradient Tracking (OGT) is the first single-loop decentralized gradient-type method that is optimal in both gradient computation and communication complexities.

Distributed gradient-based optimization in the presence of dependent aperiodic communication

- Computer Science, MathematicsArXiv
- 2022

It is shown that convergence is guaranteed provided the random variables associated with the AoI processes are stochastically dominated by a random variable with finite first moment, which improves on previous requirements of boundedness of more than the first moment.

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

- Computer Science
- 2022

This work introduces ProxSkip—a surprisingly simple and provably efficient method for minimizing the sum of a smooth and an expensive nonsmooth proximable function and obtains a provable and large improvement without any heterogeneity-bounding assumptions.

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

- Computer ScienceArXiv
- 2021

The widely used accelerated gradient tracking is revisits and extended to time-varying graphs and the dependence on the network connectivity constants can be further improved to O(1) and O( γ 1−σγ ) for the computation and communication complexities, respectively.

Recent theoretical advances in decentralized distributed convex optimization.

- Computer Science
- 2020

This paper focuses on how the results of decentralized distributed convex optimization can be explained based on optimal algorithms for the non-distributed setup, and provides recent results that have not been published yet.

## References

SHOWING 1-10 OF 37 REFERENCES

Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks

- Computer ScienceICML
- 2017

The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.

Multi-consensus Decentralized Accelerated Gradient Descent

- Computer ScienceArXiv
- 2020

A novel algorithm is proposed that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem.

An $O(1/k)$ Gradient Method for Network Resource Allocation Problems

- Computer ScienceIEEE Transactions on Control of Network Systems
- 2014

This paper develops a completely distributed fast gradient method for solving the dual of the NUM problem, and shows that the generated primal sequences converge to the unique optimal solution of theNUM problem at rate O(1/k).

Push–Pull Gradient Methods for Distributed Optimization in Networks

- Computer ScienceIEEE Transactions on Automatic Control
- 2021

“push–pull” is the first class of algorithms for distributed optimization over directed graphs for strongly convex and smooth objective functions over a network and outperform other existing linearly convergent schemes, especially for ill-conditioned problems and networks that are not well balanced.

PANDA: A Dual Linearly Converging Method for Distributed Optimization Over Time-Varying Undirected Graphs

- Mathematics, Computer Science2018 IEEE Conference on Decision and Control (CDC)
- 2018

A dual method is proposed that converges R-linearly to the optimal point given that the agents' objective functions are strongly convex and have Lipschitz continuous gradients.

Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs

- Mathematics, Computer ScienceSIAM J. Optim.
- 2017

This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.

Optimal Distributed Convex Optimization on Slowly Time-Varying Graphs

- Computer ScienceIEEE Transactions on Control of Network Systems
- 2020

A sufficient condition is provided that guarantees a convergence rate with optimal (up to logarithmic terms) dependencies on the network and function parameters if the network changes are constrained to a small percentage of the total number of iterations.

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

- Computer ScienceAISTATS
- 2021

This work proposes a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the communicated messages and obtains the first scheme that converges linearly on strongly convex decentralized problems while using compressed communication only.

Accelerated gradient methods and dual decomposition in distributed model predictive control

- Computer ScienceAutom.
- 2013

A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods

- Computer Science
- 2018

Two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters are presented, which achieves the near optimal complexities for both computation and communication.