Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks

@inproceedings{Rogozin2021TowardsAR,
  title={Towards Accelerated Rates for Distributed Optimization over Time-Varying Networks},
  author={Alexander Rogozin and Vladislav Lukoshkin and Alexander V. Gasnikov and D. Kovalev and Egor Shulgin},
  booktitle={OPTIMA},
  year={2021}
}
We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision $\varepsilon > 0$ in $O(\sqrt{\kappa_g}\chi\log^2(1/\varepsilon))$ communication steps and $O(\sqrt{\kappa_g}\log(1/\varepsilon… 

Figures from this paper

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks
TLDR
ADOM uses a dual oracle, i.e., it assumes access to the gradient of the Fenchel conjugate of the individual loss functions, and its communication complexity is the same as that of accelerated Nesterov gradient method (Nesterov, 2003).
Parallel and Distributed algorithms for ML problems
TLDR
A survey of modern parallel and distributed approaches to solve sum-type convex minimization problems come from ML applications is made.
STRONGLY CONVEX DECENTRALIZED OPTIMIZATION OVER TIME-VARYING NETWORKS
TLDR
This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks
TLDR
This work designs two optimal algorithms, one of which is a variant of the recently proposed algorithm ADOM enhanced via a multi-consensus subroutine and a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed.
Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization
TLDR
The widely used accelerated gradient tracking is revisits and extended to time-varying graphs and the dependence on the network connectivity constants can be further improved to O(1) and O( γ 1−σγ ) for the computation and communication complexities, respectively.
Acceleration in Distributed Optimization Under Similarity
TLDR
Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems.
An Accelerated Method For Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs
TLDR
This work proposes the first accelerated (in the sense of Nesterov’s acceleration) method that simultaneously attains optimal up to a logarithmic factor communication and oracle complexity bounds for smooth strongly convex distributed stochastic optimization.
Newton Method over Networks is Fast up to the Statistical Precision
TLDR
This work proposes a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph, and derives global complexity bounds for convex and strongly convex losses.
Optimal Gradient Tracking for Decentralized Optimization
TLDR
Optimal Gradient Tracking (OGT) is the first single-loop decentralized gradient-type method that is optimal in both gradient computation and communication complexities.
Inexact Tensor Methods and Their Application to Stochastic Convex Optimization
TLDR
A general non-accelerated Tensor method under inexact information on higherorder derivatives is proposed, its convergence rate is analyzed, and sufficient conditions are provided for this method to have similar complexity as the exact tensor method.
...
1
2
...

References

SHOWING 1-10 OF 37 REFERENCES
A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods
TLDR
Two algorithms based on the framework of the accelerated penalty method with increasing penalty parameters are presented, which achieves the near optimal complexities for both computation and communication.
Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization
TLDR
The widely used EXTRA and DIGing methods with variance reduction (VR) are extended, and the accelerated VR-EXTRA and VR-DIGing with both the optimal stochastic gradient computation complexity and communication complexity are proposed.
An Optimal Algorithm for Decentralized Finite Sum Optimization
TLDR
A lower bound of complexity is given to show that ADFS is optimal among decentralized algorithms, which uses local stochastic proximal updates and decentralized communications between nodes to derive ADFS.
Revisiting EXTRA for Smooth Distributed Optimization
TLDR
A sharp complexity analysis for EXTRA with the improved improved Catalyst framework is given and the strong convexity is absent and communication complexities of the accelerated EXTRA are only worse by the factors.
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
TLDR
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
TLDR
The error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions, and the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate are provided.
Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
TLDR
This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.
Accelerated Distributed Nesterov Gradient Descent
  • Guannan Qu, Na Li
  • Computer Science
    IEEE Transactions on Automatic Control
  • 2020
This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and
A dual approach for optimal algorithms in distributed optimization over networks
TLDR
This work studies dual-based algorithms for distributed convex optimization problems over networks, and proposes distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network.
EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization
TLDR
A novel decentralized exact first-order algorithm (abbreviated as EXTRA) to solve the consensus optimization problem and uses a fixed, large step size, which can be determined independently of the network size or topology.
...
1
2
3
4
...