Distributed Optimization, Averaging via ADMM, and Network Topology

  title={Distributed Optimization, Averaging via ADMM, and Network Topology},
  author={Guilherme França and Jos{\'e} Bento},
  journal={Proceedings of the IEEE},
There has been an increasing necessity for scalable optimization methods, especially due to the explosion in the size of data sets and model complexity in modern machine learning applications. Scalable solvers often distribute the computation over a network of processing units. For simple algorithms, such as gradient descent, the dependence of the convergence time with the topology of this network is well known. However, for more involved algorithms, such as the alternating direction method of… 

Figures and Tables from this paper


How is Distributed ADMM Affected by Network Topology
A full characterization of the convergence of distributed over-relaxed ADMM for the same type of consensus problem in terms of the topology of the underlying graph is provided and a proof of the aforementioned conjecture is shown it is valid for any graph, even the ones whose random walks cannot be accelerated via Markov chain lifting.
Distributed Averaging Via Lifted Markov Chains
This paper designs an algorithm with the fastest possible rate of convergence using a nonreversible Markov chain on the given network graph using the Metropolis-Hastings method, and provides the fastest mixingMarkov chain given the network topological constraints.
Polynomial Filtering for Fast Convergence in Distributed Consensus
This paper proposes to accelerate the convergence rate for given network matrices by the use of polynomial filtering algorithms, and forms the computation of the coefficients of the optimal polynometric as a semidefinite program that can be efficiently and globally solved for both static and dynamic network topologies.
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.
Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization
This paper presents an overview of recent work in decentralized optimization and surveys the state-of-theart algorithms and their analyses tailored to these different scenarios, highlighting the role of the network topology.
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
The ADMM algorithm for distributed averaging: Convergence rates and optimal parameter selection
This study derives the optimal step-size and over-relaxation parameter that minimizes the convergence time of two ADMM-based algorithms for distributed averaging and optimize the edge-weights of the communication graph to improve the convergence speed.
Distributed Estimation and Control of Algebraic Connectivity Over Random Graphs
Using results from stochastic approximation theory, it is proved that the proposed method converges almost surely (a.s.) to the desired value of connectivity even in the presence of imperfect communication scenarios.
An explicit rate bound for over-relaxed ADMM
  • G. França, José Bento
  • Computer Science, Mathematics
    2016 IEEE International Symposium on Information Theory (ISIT)
  • 2016
This paper provides an exact analytical solution to this semi-definite programming SDP and obtains a general and explicit upper bound on the convergence rate of the entire family of over-relaxed ADMM.
Accelerating Distributed Consensus Via Lifting Markov Chains
  • Wen J. Li, H. Dai
  • Computer Science
    2007 IEEE International Symposium on Information Theory
  • 2007
A Location-Aided Distributed Averaging (LADA) algorithm is proposed, which utilizes local information to construct a fast-mixing nonreversible chain in a distributed manner, and it is shown that using LADA, an e-averaging time of Theta(r-1 log(1/isin)) is achievable in a wireless network with transmission radius r.