Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks

@inproceedings{Rogozin2021TowardsAR,
  title={Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks},
  author={Alexander Rogozin and Vladislav Lukoshkin and Alexander V. Gasnikov and D. Kovalev and Egor Shulgin},
  booktitle={OPTIMA},
  year={2021}
}
We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision $\varepsilon > 0$ in $O(\sqrt{\kappa_g}\chi\log^2(1/\varepsilon))$ communication steps and $O(\sqrt{\kappa_g}\log(1/\varepsilon… 
An Accelerated Method For Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs
TLDR
This work proposes the first accelerated (in the sense of Nesterov’s acceleration) method that simultaneously attains optimal up to a logarithmic factor communication and oracle complexity bounds for smooth strongly convex distributed stochastic optimization.
Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization
TLDR
The widely used accelerated gradient tracking is revisits and extended to time-varying graphs and the dependence on the network connectivity constants can be further improved to O(1) and O( γ 1−σγ ) for the computation and communication complexities, respectively.
Newton Method over Networks is Fast up to the Statistical Precision
TLDR
This work proposes a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph, and derives global complexity bounds for convex and strongly convex losses.
ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks
TLDR
ADOM uses a dual oracle, i.e., it assumes access to the gradient of the Fenchel conjugate of the individual loss functions, and its communication complexity is the same as that of accelerated Nesterov gradient method (Nesterov, 2003).
Recent theoretical advances in decentralized distributed convex optimization.
TLDR
This paper focuses on how the results of decentralized distributed convex optimization can be explained based on optimal algorithms for the non-distributed setup, and provides recent results that have not been published yet.
Parallel and Distributed algorithms for ML problems
TLDR
A survey of modern parallel and distributed approaches to solve sum-type convex minimization problems come from ML applications is made.
Inexact Tensor Methods and Their Application to Stochastic Convex Optimization
TLDR
A general non-accelerated Tensor method under inexact information on higherorder derivatives is proposed, its convergence rate is analyzed, and sufficient conditions are provided for this method to have similar complexity as the exact tensor method.
STRONGLY CONVEX DECENTRALIZED OPTIMIZATION OVER TIME-VARYING NETWORKS
TLDR
This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.
Decentralized Saddle-Point Problems with Different Constants of Strong Convexity and Strong Concavity
TLDR
This paper study distributed saddle-point problems (SPP) with strongly-convex-strongly-concave smooth objectives that have different strong convexity and strong concavity parameters of composite terms, which correspond to min and max variables, and bilinear saddle- point part.
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks
TLDR
This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.
...
...

References

SHOWING 1-10 OF 37 REFERENCES
Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization
TLDR
The famous EXTRA and DIGing methods with accelerated variance reduction (VR) are extended, and two methods, which require the time of stochastic gradient evaluations and communication rounds to reach precision $\epsilon", are proposed.
A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods
TLDR
Two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters are presented, which achieves the near optimal complexities for both computation and communication.
Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization
TLDR
The widely used EXTRA and DIGing methods with variance reduction (VR) are extended, and the accelerated VR-EXTRA and VR-DIGing with both the optimal stochastic gradient computation complexity and communication complexity are proposed.
An Optimal Algorithm for Decentralized Finite Sum Optimization
TLDR
A lower bound of complexity is given to show that ADFS is optimal among decentralized algorithms, which uses local stochastic proximal updates and decentralized communications between nodes to derive ADFS.
Revisiting EXTRA for Smooth Distributed Optimization
TLDR
A sharp complexity analysis for EXTRA with the improved improved Catalyst framework is given and the strong convexity is absent and communication complexities of the accelerated EXTRA are only worse by the factors.
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
TLDR
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
TLDR
The error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions, and the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate are provided.
Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
TLDR
This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.
Accelerated Distributed Nesterov Gradient Descent
  • Guannan Qu, Na Li
  • Computer Science
    IEEE Transactions on Automatic Control
  • 2020
This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and
A dual approach for optimal algorithms in distributed optimization over networks
TLDR
This work studies dual-based algorithms for distributed convex optimization problems over networks, and proposes distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network.
...
...