• Corpus ID: 231951603

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

@inproceedings{Kovalev2021ADOMAD,
  title={ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks},
  author={D. Kovalev and Egor Shulgin and Peter Richt{\'a}rik and Alexander Rogozin and Alexander V. Gasnikov},
  booktitle={ICML},
  year={2021}
}
We propose ADOM – an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the same as that of accelerated Nesterov gradient method (Nesterov, 2003). To the best of our knowledge, only the algorithm of Rogozin et al. (2019… 

Figures and Tables from this paper

STRONGLY CONVEX DECENTRALIZED OPTIMIZATION OVER TIME-VARYING NETWORKS
TLDR
This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks
TLDR
This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.
Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks∗
TLDR
This work study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network, and obtains lower complexity bounds for algorithms in this setup and develops optimal methods which meet the lower bounds.
Near-Optimal Decentralized Algorithms for Saddle Point Problems over Time-Varying Networks
TLDR
This work study saddle point problems of sum type, where the summands are held by separate computational entities connected by a network, and obtains lower complexity bounds for algorithms in this setup and develops near-optimal methods which meet the lower bounds.
Achieving Efficient Distributed Machine Learning Using a Novel Non-Linear Class of Aggregation Functions
TLDR
This paper proposes a novel non-linear class of model aggregation functions to achieve efficient DML over time-varying networks and rigorously proves convergence properties of the WPM, a weighted power-p mean where p is a positive integer.
Optimal Gradient Tracking for Decentralized Optimization
TLDR
Optimal Gradient Tracking (OGT) is the first single-loop decentralized gradient-type method that is optimal in both gradient computation and communication complexities.
Distributed gradient-based optimization in the presence of dependent aperiodic communication
TLDR
It is shown that convergence is guaranteed provided the random variables associated with the AoI processes are stochastically dominated by a random variable with finite first moment, which improves on previous requirements of boundedness of more than the first moment.
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
TLDR
This work introduces ProxSkip—a surprisingly simple and provably efficient method for minimizing the sum of a smooth and an expensive nonsmooth proximable function and obtains a provable and large improvement without any heterogeneity-bounding assumptions.
Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization
TLDR
The widely used accelerated gradient tracking is revisits and extended to time-varying graphs and the dependence on the network connectivity constants can be further improved to O(1) and O( γ 1−σγ ) for the computation and communication complexities, respectively.
Recent theoretical advances in decentralized distributed convex optimization.
TLDR
This paper focuses on how the results of decentralized distributed convex optimization can be explained based on optimal algorithms for the non-distributed setup, and provides recent results that have not been published yet.

References

SHOWING 1-10 OF 37 REFERENCES
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
TLDR
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.
Multi-consensus Decentralized Accelerated Gradient Descent
TLDR
A novel algorithm is proposed that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem.
An $O(1/k)$ Gradient Method for Network Resource Allocation Problems
TLDR
This paper develops a completely distributed fast gradient method for solving the dual of the NUM problem, and shows that the generated primal sequences converge to the unique optimal solution of theNUM problem at rate O(1/k).
Push–Pull Gradient Methods for Distributed Optimization in Networks
TLDR
“push–pull” is the first class of algorithms for distributed optimization over directed graphs for strongly convex and smooth objective functions over a network and outperform other existing linearly convergent schemes, especially for ill-conditioned problems and networks that are not well balanced.
PANDA: A Dual Linearly Converging Method for Distributed Optimization Over Time-Varying Undirected Graphs
  • M. Maros, J. Jaldén
  • Mathematics, Computer Science
    2018 IEEE Conference on Decision and Control (CDC)
  • 2018
TLDR
A dual method is proposed that converges R-linearly to the optimal point given that the agents' objective functions are strongly convex and have Lipschitz continuous gradients.
Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
TLDR
This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.
Optimal Distributed Convex Optimization on Slowly Time-Varying Graphs
TLDR
A sufficient condition is provided that guarantees a convergence rate with optimal (up to logarithmic terms) dependencies on the network and function parameters if the network changes are constrained to a small percentage of the total number of iterations.
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!
TLDR
This work proposes a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators to the communicated messages and obtains the first scheme that converges linearly on strongly convex decentralized problems while using compressed communication only.
A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods
TLDR
Two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters are presented, which achieves the near optimal complexities for both computation and communication.
...
1
2
3
4
...