# Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

@article{Boyd2011DistributedOA,
title={Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers},
author={Stephen P. Boyd and Neal Parikh and Eric King-wah Chu and Borja Peleato and Jonathan Eckstein},
journal={Found. Trends Mach. Learn.},
year={2011},
volume={3},
pages={1-122}
}
• Published 23 May 2011
• Computer Science
• Found. Trends Mach. Learn.
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review…
14,531 Citations
Generalizations of the Alternating Direction Method of Multipliers for Large-Scale and Distributed Optimization
This thesis makes important generalizations to ADMM as well as extending its convergence theory, and proposes a generalized ADMM framework that allows more options of solving the subproblems, either exactly or approximately.
Sensitivity Assisted Alternating Directions Method of Multipliers for Distributed Optimization and Statistical Learning
• Computer Science
• 2020
A sensitivity-assisted ADMM algorithm is proposed that leverages the parametric sensitivities such that the subproblems solutions can be approximated using a tangential predictor, thus easing the computational burden to computing one linear solve.
Distributed algorithms for convex problems with linear coupling constraints
• Computer Science
J. Glob. Optim.
• 2020
An augmented Lagrangian method to solve convex problems with linear coupling constraints that can be distributed and requires a single gradient projection step at every iteration is proposed and a distributed version of the algorithm is introduced allowing to partition the data and perform the distribution of the computation in a parallel fashion.
Proximal Splitting Algorithms: Overrelax them all!
• Computer Science
• 2019
This paper presents several splitting methods in the single umbrella of the forward–backward iteration to solve monotone inclusions, applied with preconditioning, and shows that, when the smooth term in the objective function is quadratic, convergence is guaranteed with larger values of the relaxation parameter than previously known.
Modern Optimization for Statistics and Learning
This dissertation introduces a novel algorithm for variable clustering named FORCE, based on solving a convex relaxation of the K-means criterion, as well as post-dimension reduction inferential procedures, and derives a novel class of variance-reduced estimators called Marginal Policy Gradients.
Stochastic Alternating Direction Method of Multipliers
• Computer Science, Mathematics
ICML
• 2013
This paper establishes the convergence rate of ADMM for convex problems in terms of both the objective value and the feasibility violation, and proposes a stochastic ADMM algorithm for optimization problems with non-smooth composite objective functions.
Convex Optimization and Extensions, with a View Toward Large-Scale Problems
It is proved that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure.
Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results
This chapter, assuming as little prior knowledge of convex analysis as possible, shows that the actual convergence mechanism of the algorithm is quite different, and underscores this observations with some new computational results in which the ADMM is compared to algorithms that do indeed work by approximately minimizing the augmented Lagrangian.
Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers
• Computer Science
AISTATS
• 2012
This paper proposes and implements distributed algorithms that achieve parallel disk loading and access the disk only once that are faster than existing distributed solvers, such as Zinkevich et al.
Adaptive Stochastic Alternating Direction Method of Multipliers
• Computer Science
ICML
• 2015
Stochastic ADMM algorithms with optimal second order proximal functions are presented, which produce a new family of adaptive subgradient methods and theoretically prove that their regret bounds are as good as the bounds which could be achieved by the best proximal function that can be chosen in hindsight.

## References

SHOWING 1-10 OF 209 REFERENCES
Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems
• Computer Science
IEEE Journal of Selected Topics in Signal Processing
• 2007
This paper proposes gradient projection algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems and test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method.
Gradient methods for minimizing composite objective function
In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and
Bundle Methods for Regularized Risk Minimization
• Computer Science
J. Mach. Learn. Res.
• 2010
The theory and implementation of a scalable and modular convex solver which solves all these estimation problems, which can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as L1 and L2 penalties is described.
Distributed Sparse Linear Regression
• Computer Science
IEEE Transactions on Signal Processing
• 2010
Three novel algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons are developed.
Alternating Direction Algorithms for 1-Problems in Compressive Sensing
• Computer Science
SIAM J. Sci. Comput.
• 2011
This paper proposes and investigates two classes of algorithms derived from either the primal or the dual form of $\ell_1$-problems, and presents numerical results to emphasize two practically important but perhaps overlooked points: that algorithm speed should be evaluated relative to appropriate solution accuracy; and that when erroneous measurements possibly exist, the $ell-1-fidelity should generally be preferable to the$\ell-2\$-f fidelity.
Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse
• Computer Science
• 2008
Homotopy is shown to run much more rapidly than general-purpose LP solvers when sufficient sparsity is present, implying that Homotopy may be used to rapidly decode error-correcting codes in a stylized communication system with a computational budget constraint.
Alternating Direction Methods for Sparse Covariance Selection *
This paper is to apply the well-known alternating direction method (ADM), which is also a first-order method, to solve the convex relaxation of SCSP, and preliminary numerical results show that the ADM approach substantially outperforms existing first- order methods for SCSP.
Some Reformulations and Applications of the Alternating Direction Method of Multipliers
• Computer Science
• 1994
The alternating direction method of multipliers decomposition algorithm for convex programming, as recently generalized by Eckstein and Bert- sekas, is considered, and some reformulations of the algorithm are given, and several alternative means for deriving them are discussed.
Sparse Inverse Covariance Selection via Alternating Linearization Methods
• Computer Science
NIPS
• 2010
This paper proposes a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions.
Monotone operator splitting for optimization problems in sparse recovery
• Computer Science
2009 16th IEEE International Conference on Image Processing (ICIP)
• 2009
This paper formalizes many of these optimization problems involved in recovery of sparse solutions of linear inverse problems within a unified framework of convex optimization theory, and invoke tools from convex analysis and maximal monotone operator splitting.