Corpus ID: 3994909

The Implicit Bias of Gradient Descent on Separable Data

@article{Soudry2018TheIB,
  title={The Implicit Bias of Gradient Descent on Separable Data},
  author={Daniel Soudry and Elad Hoffer and Suriya Gunasekar and Nathan Srebro},
  journal={J. Mach. Learn. Res.},
  year={2018},
  volume={19},
  pages={70:1-70:57}
}
  • Daniel Soudry, Elad Hoffer, +1 author Nathan Srebro
  • Published in J. Mach. Learn. Res. 2018
  • Computer Science, Mathematics
  • We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 189 CITATIONS

    Convergence of SGD in Learning ReLU Models with Separable Data

    VIEW 10 EXCERPTS
    CITES BACKGROUND & RESULTS
    HIGHLY INFLUENCED

    On the Decision Boundary of Deep Neural Networks

    VIEW 8 EXCERPTS
    CITES BACKGROUND, RESULTS & METHODS
    HIGHLY INFLUENCED

    On the Geometry of Adversarial Examples

    VIEW 7 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?

    VIEW 20 EXCERPTS
    CITES BACKGROUND, METHODS & RESULTS
    HIGHLY INFLUENCED

    Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

    VIEW 6 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

    VIEW 4 EXCERPTS
    CITES RESULTS, BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Bias of Homotopic Gradient Descent for the Hinge Loss

    VIEW 9 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2017
    2020

    CITATION STATISTICS

    • 31 Highly Influenced Citations

    • Averaged 50 Citations per year from 2017 through 2019

    • 113% Increase in citations per year in 2019 over 2018

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 14 REFERENCES

    Adam: A Method for Stochastic Optimization

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    gap in large batch training of neural networks

    • I Hubara, M Courbariaux, D. Soudry, R El-yaniv, Y Bengio
    • In NIPS (oral presentation),
    • 2017

    gap in large batch training of neural networks

    • I Hubara, M Courbariaux, D. Soudry, R El-yaniv, Y Bengio
    • In NIPS (oral presentation),
    • 2017