Corpus ID: 235458427

A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization

@article{Li2021ASN,
  title={A Short Note of PAGE: Optimal Convergence Rates for Nonconvex Optimization},
  author={Zhize Li},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.09663}
}
  • Zhize Li
  • Published 2021
  • Computer Science, Mathematics
  • ArXiv
In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., 2021). Then we provide a simple and clean convergence analysis of PAGE for achieving optimal convergence rates. Moreover, PAGE and its analysis can be easily adopted and generalized to other works. We hope that this note provides the insights and is helpful for future works. 1 Problem Setting We consider the nonconvex optimization problem minx∈Rd f(x). The nonconvex function f has… Expand
CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
TLDR
This work proposes a compressed and accelerated gradient method for distributed optimization, which it achieves the first accelerated rate O, and aims to combine the benefits of communication compression and convergence acceleration. Expand
FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning
TLDR
This work proposes a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. Expand
ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation
TLDR
ZeroSARAH is the first variance-reduced method which does not require any full gradient computations, not even for the initial point, and is expected to have a practical impact in distributed and federated learning where full device participation is impractical. Expand

References

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization
TLDR
The results demonstrate that PAGE not only converges much faster than SGD in training but also achieves the higher test accuracy, validating the theoretical results and confirming the practical superiority of PAGE. Expand