A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization

@article{Lee2018ADQ,
  title={A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization},
  author={Ching-pei Lee and Cong Han Lim and Stephen J. Wright},
  journal={Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  year={2018}
}
We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving ERM problems with a nonsmooth regularization term. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations, and we describe how to maintain an approximation of the Hessian and solve subproblems efficiently in a… 

Figures and Tables from this paper

A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

TLDR
A communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term that enjoys global linear convergence for a broad range of non-strongly convex problems that includes the most commonly used ERMs, thus requiring lower communication complexity.

DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate

TLDR
A distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence guarantees is developed, believed to be the first distributed asynchronous algorithm with super linear convergence guarantees to be developed.

Partial-Quasi-Newton Methods: Efficient Algorithms for Minimax Optimization Problems with Unbalanced Dimensionality

TLDR
A novel second-order optimization algorithm, called Partial-Quasi-Newton (PQN) method, which takes the advantage of unbalanced structure in the problem to establish the Hessian estimate efficiently and theoretically proves the PQN method converges to the saddle point faster than existing minimax optimization algorithms.

A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization

TLDR
This work designs a proximal gradient decentralized algorithm whose fixed point coincides with the desired minimizer and provides a concise proof that establishes its linear convergence.

Successive Quadratic Approximation for Regularized Optimization

TLDR
This work presents global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that it is sufficient to obtain an inexact solution of the subproblem to fixed multiplicative precision in order to guarantee the same order of convergence rate as the exact version.

Inexact Successive quadratic approximation for regularized optimization

TLDR
This work presents global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that an inexact solution of the subproblem that is within a fixed multiplicative precision of optimality suffices to guarantee the same order of convergence rate as the exact version of the method.

A Distributed Second-Order Algorithm You Can Trust

TLDR
A new algorithm for distributed training of generalized linear models that only requires the computation of diagonal blocks of the Hessian matrix on the individual workers and dynamically adapts the auxiliary model to compensate for modeling errors is presented.

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

TLDR
This work presents a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing, and extends the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso.

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

TLDR
A framework which allows to choose the stepsize and the momentum parameters of these algorithms in a way to optimize performance by systematically trading off the bias, variance, robustness to gradient noise and dependence to network effects is developed.

L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

TLDR
This work proposes a distributed algorithm for solving empirical risk minimization problems, called L-DQN, under the master/worker communication model, which is the first distributed quasi-Newton method with provable global linear convergence guarantees in the asynchronous setting where delays between nodes are present.

References

SHOWING 1-10 OF 30 REFERENCES

Practical inexact proximal quasi-Newton method with global complexity analysis

TLDR
A general framework is proposed, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provides a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

DiSCO: Distributed Optimization for Self-Concordant Empirical Loss

TLDR
The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method, and its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions are analyzed.

Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators

TLDR
It is shown that the proximal quasi-Newton method is provably super-linearly convergent, even in the absence of strong convexity, by leveraging a restricted variant of strong Convexity.

A coordinate gradient descent method for nonsmooth separable minimization

TLDR
A (block) coordinate gradient descent method for solving this class of nonsmooth separable problems and establishes global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method.

Proximal Quasi-Newton Methods for Convex Optimization

TLDR
The analysis and computational results show that acceleration may not bring any benefit in the quasi-Newton setting and a practical comparison of the accelerated proximal quasi- Newton algorithm and the regular one is performed.

The Common-directions Method for Regularized Empirical Risk Minimization

TLDR
This work proposes an interpolation between firstand second- order methods for regularized empirical risk minimization that exploits the problem structure to efficiently combine multiple update directions and attains both optimal global linear convergence rate for first-order methods, and local quadratic convergence.

Communication-Efficient Distributed Optimization using an Approximate Newton-type Method

TLDR
A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size.

Inexact Successive quadratic approximation for regularized optimization

TLDR
This work presents global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that an inexact solution of the subproblem that is within a fixed multiplicative precision of optimality suffices to guarantee the same order of convergence rate as the exact version of the method.

Variable Metric Inexact Line-Search-Based Methods for Nonsmooth Optimization

TLDR
A new proximal-gradient method for minimizing the sum of a differentiable, possibly nonconvex, function plus a convex, possibly nondifferentiable, function, and an Armijo-like rule to determine the stepsize ensuring the sufficient decrease of the objective function is developed.

Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

TLDR
This work shows that both the basic proximal-gradient method and the accelerated proximal - gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates.