Better scalability under potentially heavy-tailed feedback
@article{Holland2020BetterSU, title={Better scalability under potentially heavy-tailed feedback}, author={Matthew J. Holland}, journal={ArXiv}, year={2020}, volume={abs/2012.07346} }
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we instead focus computational effort on robustly choosing (or newly constructing) a strong candidate based on a collection of cheap… CONTINUE READING
Figures and Tables from this paper
References
SHOWING 1-10 OF 47 REFERENCES
Efficient learning with robust gradient descent
- Computer Science, Mathematics
- Machine Learning
- 2019
- 16
- PDF
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
- Computer Science, Mathematics
- ICML
- 2018
- 82
- PDF
DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation
- Computer Science, Mathematics
- NeurIPS
- 2019
- 30
- PDF
Optimal Learning for Multi-pass Stochastic Gradient Methods
- Computer Science, Mathematics
- NIPS
- 2016
- 27
- PDF
Optimization Methods for Large-Scale Machine Learning
- Computer Science, Mathematics
- SIAM Rev.
- 2018
- 1,209
- PDF
Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent
- Computer Science
- NIPS
- 2017
- 245
- PDF
Loss Minimization and Parameter Estimation with Heavy Tails
- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2016
- 107
- Highly Influential
- PDF