Corpus ID: 32711926

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

@inproceedings{Ma2018ThePO,
  title={The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning},
  author={Siyuan Ma and R. Bassily and M. Belkin},
  booktitle={ICML},
  year={2018}
}
  • Siyuan Ma, R. Bassily, M. Belkin
  • Published in ICML 2018
  • Computer Science, Mathematics
  • In this paper we aim to formally explain the phenomenon of fast convergence of SGD observed in modern machine learning. [...] Key ResultFinally, we show how our results fit in the recent developments in training deep neural networks and discuss connections to adaptive rates for SGD and variance reduction.Expand Abstract
    To understand deep learning we need to understand kernel learning
    • 151
    • PDF
    Local SGD Converges Fast and Communicates Little
    • S. Stich
    • Computer Science, Mathematics
    • 2019
    • 151
    • PDF
    Reconciling modern machine-learning practice and the classical bias–variance trade-off
    • 170
    • PDF
    Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
    • 74
    • Highly Influenced
    • PDF
    Measuring the Effects of Data Parallelism on Neural Network Training
    • 128
    • PDF
    An Empirical Model of Large-Batch Training
    • 69
    • PDF
    SGD: General Analysis and Improved Rates
    • 59
    • Highly Influenced
    • PDF
    On exponential convergence of SGD in non-convex over-parametrized learning
    • 30
    • PDF
    Reconciling modern machine learning practice and the bias-variance trade-off
    • 56
    • PDF

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 47 REFERENCES
    Densely Connected Convolutional Networks
    • 9,761
    • PDF
    Understanding deep learning requires rethinking generalization
    • 2,069
    • PDF
    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
    • 1,240
    • Highly Influential
    • PDF
    Optimization Methods for Large-Scale Machine Learning
    • 1,033
    • PDF
    Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
    • 1,535
    • PDF
    An Analysis of Deep Neural Network Models for Practical Applications
    • 530
    • PDF
    Don't Decay the Learning Rate, Increase the Batch Size
    • 371
    • Highly Influential
    • PDF
    Deep Learning with Limited Numerical Precision
    • 1,080
    • PDF