Distributed Gradient Methods with Variable Number of Working Nodes

  title={Distributed Gradient Methods with Variable Number of Working Nodes},
  author={Du{\vs}an Jakoveti{\'c} and Dragana Bajovi{\'c} and Natasa Krejic and Nata{\vs}a Krklec Jerinki{\'c}},
  journal={IEEE Transactions on Signal Processing},
We consider distributed optimization where N nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration k, performs an update (is active) with probability pk, and stays idle (is inactive) with probability 1-pk. Whenever active, each node performs an update by weight-averaging its solution estimate with the estimates of its active neighbors, taking a negative gradient… Expand
Distributed first and second order methods with increasing number of working nodes
The presented mechanism relaxes the requirement that all nodes are active at all iterations k, which significantly improves the computational and communication efficiency of some recently proposed first and second order distributed methods for solving distributed optimization problems. Expand
Distributed second order methods with variable number of working nodes
It is demonstrated that the idling mechanism can be successfully incorporated in distributed second order methods also, thus achieving the same order of convergence rate (R-linear) as the standard DQN, but with significantly cheaper updates. Expand
Newton-like Method with Diagonal Correction for Distributed Optimization
This work proposes a class of distributed Newton-like methods, which it refers to as Distributed Quasi Newton (DQN), which approximates the Hessian inverse by: 1) splitting the Hessians into its diagonal and off-diagonal part, 2) inverting the diagonal part, and 3) approximating the inverse of the off- diagonal part through a weighted linear function. Expand
Resource-Aware Exact Decentralized Optimization Using Event-Triggered Broadcasting
The linearized augmented Lagrangian method is used to design an event-triggered decentralized algorithm that only requires light local computation at generic time instants and peer-to-peer communication at sporadic triggering time instant results show its performance and superiority in exploiting communication resources. Expand
Optimality Trade-offs For Distributed Estimation Anit
This paper proposes Communication efficient REcursive Distributed estimatiOn algorithm, CREDO, for networked multi-worker setups without a central master node. CREDO is designed for scenarios inExpand
Accelerated Distributed Dual Averaging Over Evolving Networks of Growing Connectivity
This paper extends the distributed dual averaging (DDA) subgradient algorithm to evolving networks of growing connectivity and provides quantitative evaluation of DDA acceleration for distributed optimization that is absent in the existing analysis. Expand
Stochastic sub-gradient algorithm for distributed optimization with random sleep scheme
In this paper, we consider a distributed convex optimization problem of a multi-agent system with the global objective function as the sum of agents’ individual objective functions. To solve such anExpand
Inference and Optimization over Networks: Communication Efficiency and Optimality
This thesis proposes distributed recursive algorithms where the networked entities simultaneously incorporate locally sensed information and information obtained from the neighborhood in distributed setups which exhibit data parallelism and also potentially analytically intractable loss functions. Expand
Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates.
We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-nodeExpand
Distributed Event-Triggered Gradient Method for Constrained Convex Minimization
This paper investigates the distributed gradient method for large-scale convex constrained problems with event-triggered consensus protocols and shows that the convergence can be ensured provided that the event-triggering threshold bound is square summable, and the stepsize satisfies specific conditions that are characterized by the Lipschitz constant of the gradient. Expand


Fast Distributed Gradient Methods
This work proposes two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establishes their convergence rates in terms of the per-node communications K and theper-node gradient evaluations k. Expand
Fast cooperative distributed learning
This work proposes a distributed gradient-like algorithm, that is built from the (centralized) Nesterov gradient method, that converges at rate O(log k/k) and demonstrates the gains obtained on two simulation examples: acoustic source localization and learning a linear classifier based on l2-regularized logistic loss. Expand
Network Newton-Part I: Algorithm and Convergence
We study the problem of minimizing a sum of convex objective functions where the components of the objective are available at different nodes of a network and nodes are allowed to only communicateExpand
A fast distributed proximal-gradient method
  • Annie I. Chen, A. Ozdaglar
  • Computer Science, Mathematics
  • 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2012
This paper shows that this distributed proximal-gradient method for optimizing the average of convex functions, each of which is the private local objective of an agent in a network with time-varying topology converges at the rate 1/k, which is faster than the convergence rate of the existing distributed methods for solving this problem. Expand
Network Newton
The Network Newton method is proposed as a distributed algorithm that incorporates second order information via distributed evaluation of approximations to Newton steps and adaptive (A)NN is introduced in order to establish exact convergence. Expand
Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization
This paper considers a distributed multi-agent network system where the goal is to minimize a sum of convex objective functions of the agents subject to a common convex constraint set, and investigates the effects of stochastic subgradient errors on the convergence of the algorithm. Expand
Convergence analysis of distributed subgradient methods over random networks
  • I. Lobel, A. Ozdaglar
  • Mathematics, Computer Science
  • 2008 46th Annual Allerton Conference on Communication, Control, and Computing
  • 2008
This work considers the problem of cooperatively minimizing the sum of convex functions, where the functions represent local objective functions of the agents and proposes a distributed subgradient method that uses averaging algorithms for locally sharing information among the agents. Expand
Asynchronous gossip algorithms for stochastic optimization
  • S. Ram, A. Nedić, V. Veeravalli
  • Computer Science
  • Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference
  • 2009
An asynchronous algorithm that is motivated by random gossip schemes where each agent has a local Poisson clock and it is proved that the gradients converge to zero with probability 1 and the iterates converge to an optimal solution almost surely. Expand
Distributed Nesterov-like gradient algorithms
This paper presents a distributed, constant step size, Nesterov-like gradient algorithm that converges much faster than existing distributed (sub)gradient methods, with zero additional communications and very small additional computations per iteration, on a class of convex functions with Lipschitz continuous first derivative. Expand
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork. Expand