• Corpus ID: 199000989

# Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity

@article{Huang2019NonconvexZS,
title={Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity},
author={Feihu Huang and Shangqian Gao and Jian Pei and Heng Huang},
journal={ArXiv},
year={2019},
volume={abs/1907.13463}
}
• Published 30 July 2019
• Computer Science
• ArXiv
Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high…
10 Citations

## Figures and Tables from this paper

### Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization

• Computer Science
ArXiv
• 2020
An accelerated momentum descent ascent (Acc-MDA) method is presented for solving the white-box minimax problems, and it is proved that it achieves the best known gradient complexity of $\tilde{O}(\kappa_y^3\epsilon^{-3})$ without large batches.

### SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

• Zhe Wang
• Computer Science
• 2018
This paper proposes SpiderBoost as an improved scheme, which allows to use a much larger constant-level stepsize while maintaining the same near-optimal oracle complexity, and can be extended with proximal mapping to handle composite optimization (which is nonsmooth and nonconvex) with provable convergence guarantee.

### Accelerated Stochastic Gradient-free and Projection-free Methods

• Computer Science
ICML
• 2020
An accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW) method based on the variance reduced technique of SPIDER/SpiderBoost and a novel momentum accelerated technique is proposed, which still reaches the function query complexity of O(d\epsilon^{-3}) in the stoChastic problem without relying on any large batches.

### Faster Stochastic Quasi-Newton Methods

• Computer Science
IEEE Transactions on Neural Networks and Learning Systems
• 2022
A novel faster stochastic QN method (SpiderSQN) based on the variance reduced technique of SIPDER is proposed, and it is proved that this method reaches the best known SFO complexity ofinline-formula, which also matches the existing best result.

### Discrete Model Compression With Resource Constraint for Deep Neural Networks

• Computer Science
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
• 2020
An efficient discrete optimization method to directly optimize channel-wise differentiable discrete gate under resource constraint while freezing all the other model parameters, which is globally discrimination-aware due to the discrete setting.

### Desirable Companion for Vertical Federated Learning: New Zeroth-Order Gradient Based Algorithm

• Computer Science
CIKM
• 2021
This paper reveals that zeroth-order optimization (ZOO) is a desirable companion for VFL and proposes a novel and practical VFL framework with black-box models, which is inseparably interconnected to the promising properties of ZOO.

### AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization

• Computer Science
KDD
• 2021
An asynchronous stochastic quasi-Newton (AsySQN) framework for VFL is proposed, under which three algorithms making descent steps scaled by approximate Hessian information convergence much faster than SGD-based methods in practice and thus can dramatically reduce the number of communication rounds.

### On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

• Computer Science
NeurIPS
• 2020
This work reexamine the effectiveness of RARL under a fundamental robust control setting: the linear quadratic (LQ) case, and proposes several other policy-based R ARL algorithms whose convergence behaviors are studied both empirically and theoretically.

### Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

• Computer Science
J. Mach. Learn. Res.
• 2022
It is proved that the Acc-ZOM method achieves a lower query complexity of Õ(d −3) for finding an -stationary point, which improves the best known result by a factor of O(d) where d denotes the parameter dimension.

## References

SHOWING 1-10 OF 41 REFERENCES

### Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

• Computer Science
IJCAI
• 2019
A class of fast zeroth-order stochastic ADMM methods for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator, which not only reach the best convergence rate for the non Convex optimization, but also are able to effectively solve many complex machine learning problems withmultiple regularized penalties and constraints.

### Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization

• Computer Science
AAAI
• 2019
A class of faster zeroth-order proximal stochastic methods with the variance reduction techniques of SVRG and SAGA are proposed, which are denoted as ZO-ProxSVRGs and ZO -ProxSAGA, respectively.

### Stochastic Zeroth-order Optimization via Variance Reduction method

• Computer Science
ArXiv
• 2018
This paper introduces a novel Stochastic Zeroth-order method with Variance Reduction under Gaussian smoothing (SZVR-G) and establishes the complexity for optimizing non-convex problems and successfully applies the method to conduct a universal black-box attack to deep neural networks.

### Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

• Kaiyi JiZhe Wang
• Computer Science
ICML
• 2019
A new algorithm is developed, which is free from Gaussian variable generation and allows a large constant stepsize while maintaining the same convergence rate and query complexity, and it is shown that ZO-SPIDER-Coord automatically achieves a linear convergence rate as the iterate enters into a local PL region without restart and algorithmic modification.

### Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

• Computer Science
NeurIPS
• 2018
Two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators are proposed, which achieve the best rate known for ZO stochastic optimization (in terms of iterations) and strike a balance between the convergence rate and the function query complexity.

### Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization

• Computer Science
ICML
• 2019
The theoretical analysis shows that the online SPIDER-ADMM has the IFO complexity of $\mathcal{O}(\epsilon^{-\frac{3}{2}})$, which improves the existing best results by a factor of $n$ and the experimental results on benchmark datasets validate that the proposed algorithms have faster convergence rate than the existing ADMM algorithms for nonconvex optimization.

### Stochastic Alternating Direction Method of Multipliers

• Computer Science, Mathematics
ICML
• 2013
This paper establishes the convergence rate of ADMM for convex problems in terms of both the objective value and the feasibility violation, and proposes a stochastic ADMM algorithm for optimization problems with non-smooth composite objective functions.

### Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

• Computer Science
Math. Program.
• 2016
A randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of Stochastic samples allowed, is proposed, which shows nearly optimal complexity of the algorithm for convex stoChastic programming.

### Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming

• Computer Science, Mathematics
SIAM J. Optim.
• 2013
This paper discusses a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and shows that such modification allows to improve significantly the large-deviation properties of the algorithms.

### SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization

• Computer Science
ArXiv
• 2018
SpiderBoost is proposed as an improved scheme that allows much larger stepsize without sacrificing the convergence rate, and hence runs substantially faster in practice, and extends much more easily to proximal algorithms with guaranteed convergence for solving composite optimization problems.