Scalable Adaptive Stochastic Optimization Using Random Projections

@article{Krummenacher2016ScalableAS,
  title={Scalable Adaptive Stochastic Optimization Using Random Projections},
  author={Gabriel Krummenacher and Brian McWilliams and Yannic Kilcher and Joachim M. Buhmann and Nicolai Meinshausen},
  journal={ArXiv},
  year={2016},
  volume={abs/1611.06652}
}
Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain better performance, however in high dimensions it is computationally impractical. We present Ada-LR… CONTINUE READING
6
Twitter Mentions

Citations

Publications citing this paper.
SHOWING 1-10 OF 11 CITATIONS

Accelerating adaptive online learning by matrix approximation

  • International Journal of Data Science and Analytics
  • 2019
VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Accelerating Adaptive Online Learning by Matrix Approximation

VIEW 7 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation

VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Adaptive Cost-Sensitive Online Classification

  • IEEE Transactions on Knowledge and Data Engineering
  • 2018
VIEW 2 EXCERPTS
CITES BACKGROUND & METHODS