Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

@article{Boyd2011DistributedOA,
  title={Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers},
  author={Stephen P. Boyd and Neal Parikh and Eric King-wah Chu and Borja Peleato and Jonathan Eckstein},
  journal={Found. Trends Mach. Learn.},
  year={2011},
  volume={3},
  pages={1-122}
}
Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review… 
Generalizations of the Alternating Direction Method of Multipliers for Large-Scale and Distributed Optimization
TLDR
This thesis makes important generalizations to ADMM as well as extending its convergence theory, and proposes a generalized ADMM framework that allows more options of solving the subproblems, either exactly or approximately.
Sensitivity Assisted Alternating Directions Method of Multipliers for Distributed Optimization and Statistical Learning
TLDR
A sensitivity-assisted ADMM algorithm is proposed that leverages the parametric sensitivities such that the subproblems solutions can be approximated using a tangential predictor, thus easing the computational burden to computing one linear solve.
Distributed algorithms for convex problems with linear coupling constraints
TLDR
An augmented Lagrangian method to solve convex problems with linear coupling constraints that can be distributed and requires a single gradient projection step at every iteration is proposed and a distributed version of the algorithm is introduced allowing to partition the data and perform the distribution of the computation in a parallel fashion.
Proximal Splitting Algorithms: Overrelax them all!
TLDR
This paper presents several splitting methods in the single umbrella of the forward–backward iteration to solve monotone inclusions, applied with preconditioning, and shows that, when the smooth term in the objective function is quadratic, convergence is guaranteed with larger values of the relaxation parameter than previously known.
Modern Optimization for Statistics and Learning
TLDR
This dissertation introduces a novel algorithm for variable clustering named FORCE, based on solving a convex relaxation of the K-means criterion, as well as post-dimension reduction inferential procedures, and derives a novel class of variance-reduced estimators called Marginal Policy Gradients.
Stochastic Alternating Direction Method of Multipliers
TLDR
This paper establishes the convergence rate of ADMM for convex problems in terms of both the objective value and the feasibility violation, and proposes a stochastic ADMM algorithm for optimization problems with non-smooth composite objective functions.
Convex Optimization and Extensions, with a View Toward Large-Scale Problems
TLDR
It is proved that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure.
Augmented Lagrangian and Alternating Direction Methods for Convex Optimization: A Tutorial and Some Illustrative Computational Results
TLDR
This chapter, assuming as little prior knowledge of convex analysis as possible, shows that the actual convergence mechanism of the algorithm is quite different, and underscores this observations with some new computational results in which the ADMM is compared to algorithms that do indeed work by approximately minimizing the augmented Lagrangian.
Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers
TLDR
This paper proposes and implements distributed algorithms that achieve parallel disk loading and access the disk only once that are faster than existing distributed solvers, such as Zinkevich et al.
Adaptive Stochastic Alternating Direction Method of Multipliers
TLDR
Stochastic ADMM algorithms with optimal second order proximal functions are presented, which produce a new family of adaptive subgradient methods and theoretically prove that their regret bounds are as good as the bounds which could be achieved by the best proximal function that can be chosen in hindsight.
...
...

References

SHOWING 1-10 OF 209 REFERENCES
Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems
TLDR
This paper proposes gradient projection algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems and test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method.
Gradient methods for minimizing composite objective function
In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and
Bundle Methods for Regularized Risk Minimization
TLDR
The theory and implementation of a scalable and modular convex solver which solves all these estimation problems, which can be parallelized on a cluster of workstations, allows for data-locality, and can deal with regularizers such as L1 and L2 penalties is described.
Distributed Sparse Linear Regression
TLDR
Three novel algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons are developed.
Alternating Direction Algorithms for 1-Problems in Compressive Sensing
TLDR
This paper proposes and investigates two classes of algorithms derived from either the primal or the dual form of $\ell_1$-problems, and presents numerical results to emphasize two practically important but perhaps overlooked points: that algorithm speed should be evaluated relative to appropriate solution accuracy; and that when erroneous measurements possibly exist, the $ell-1-fidelity should generally be preferable to the $\ell-2$-f fidelity.
Fast Solution of -Norm Minimization Problems When the Solution May Be Sparse
TLDR
Homotopy is shown to run much more rapidly than general-purpose LP solvers when sufficient sparsity is present, implying that Homotopy may be used to rapidly decode error-correcting codes in a stylized communication system with a computational budget constraint.
Alternating Direction Methods for Sparse Covariance Selection *
TLDR
This paper is to apply the well-known alternating direction method (ADM), which is also a first-order method, to solve the convex relaxation of SCSP, and preliminary numerical results show that the ADM approach substantially outperforms existing first- order methods for SCSP.
Some Reformulations and Applications of the Alternating Direction Method of Multipliers
TLDR
The alternating direction method of multipliers decomposition algorithm for convex programming, as recently generalized by Eckstein and Bert- sekas, is considered, and some reformulations of the algorithm are given, and several alternative means for deriving them are discussed.
Sparse Inverse Covariance Selection via Alternating Linearization Methods
TLDR
This paper proposes a first-order method based on an alternating linearization technique that exploits the problem's special structure; in particular, the subproblems solved in each iteration have closed-form solutions.
Monotone operator splitting for optimization problems in sparse recovery
TLDR
This paper formalizes many of these optimization problems involved in recovery of sparse solutions of linear inverse problems within a unified framework of convex optimization theory, and invoke tools from convex analysis and maximal monotone operator splitting.
...
...