DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

  title={DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization},
  author={Boyue Li and Zhize Li and Yuejie Chi},
  journal={SIAM J. Math. Data Sci.},
Emerging applications in multi-agent environments such as internet-of-things, networked sensing, autonomous systems and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource-efficient in terms of both computation and communication. In this paper, we consider the prototypical setting where the agents work collaboratively to minimize the sum of local loss functions by only communicating with their neighbors over a predetermined network topology. We… 

Figures and Tables from this paper

Harvesting Curvatures for Communication-Efficient Distributed Optimization

A novel algorithm is proposed that refines this idea by constructing a second-order correction term using a BFGS-style update formula, where the kernel matrix is updated recursively using only history gradients to harvest curvature information for accelerating convergence.

BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O (1 /T ) than the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity.

An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization

A Lyapunov function is constructed that simultaneously characterizes the function value, the gradient estimation error and the consensus error for the convergence analysis of DEAREST, the best-known optimal algorithm for decentralized nonconvex optimization.

Proximal Stochastic Recursive Momentum Methods for Nonconvex Composite Decentralized Optimization

This work proposes a single-loop algorithm, called DEEPSTORM, that achieves optimal sample complexity for decentralized nonconvex stochastic composite problems, requiring $\mathcal{O}(1)$ batch size, and shows that DEEPstORM with a constant step size achieves a network-independent sample complexity.

A Simple and Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization

To the best the authors' knowl-edge, DREAM is the first algorithm whose SFO and communication complexities simultaneously achieve the optimal dependency on ǫ and λ 2 ( W ) for this problem.

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

A general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios is developed, and the obtained rate can recover the best known results for many existing algorithms.

SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression

A framework called SoteriaFL is proposed, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme, and is shown to achieve better communication complexity without sacrificing privacy nor utility than other private federated learning algorithms without communication compression.

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

This work proposes and analyzes several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems, and proposes an optimal algorithm, called SSRGD, based on SARAH, which can find an -approximate (first-order) stationary point by simply adding some random perturbations.



Communication-Efficient Distributed Optimization in Networks with Gradient Tracking

This work suggests that performing a certain amount of local communications and computations per iteration can substantially improve the overall efficiency, and extends Network-DANE to composite optimization by allowing a nonsmooth penalty term.

Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization

This paper presents an overview of recent work in decentralized optimization and surveys the state-of-theart algorithms and their analyses tailored to these different scenarios, highlighting the role of the network topology.

Balancing Communication and Computation in Distributed Optimization

This paper proposes an adaptive cost framework that adjusts the cost measure depending on the features of various applications, and presents a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications.

NEXT: In-Network Nonconvex Optimization

  • P. LorenzoG. Scutari
  • Computer Science
    IEEE Transactions on Signal and Information Processing over Networks
  • 2016
This work introduces the first algorithmic framework for the distributed minimization of the sum of a smooth function-the agents' sum-utility-plus a convex (possibly nonsmooth and nonseparable) regularizer, and shows that the new method compares favorably to existing distributed algorithms on both convex and nonconvex problems.

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

This work proposes an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates).

A General Framework for Decentralized Optimization With First-Order Methods

A general framework of decentralized first-order methods that is applicable to directed and undirected communication networks alike is provided and it is shown that much of the existing work on optimization and consensus can be related explicitly to this framework.

Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs

This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.

BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O (1 /T ) than the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity.

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

This work shows that having gradient iterations with constant step size enables convergence to within $\epsilon$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition, and this result also holds for smooth strongly convex objectives.

S-NEAR-DGD: A Flexible Distributed Stochastic Gradient Method for Inexact Communication

The theoretical results prove that the proposed algorithm converges linearly in expectation to a neighborhood of the optimal solution for strongly convex objective functions with Lipschitz gradients.