Corpus ID: 221856753

Implicit Gradient Regularization

  title={Implicit Gradient Regularization},
  author={David G. T. Barrett and B. Dherin},
  • David G. T. Barrett, B. Dherin
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. We confirm empirically that implicit gradient regularization biases… CONTINUE READING
    1 Citations

    Figures and Tables from this paper.

    Gradient Regularisation as Approximate Variational Inference


    The Implicit Regularization of Stochastic Gradient Flow for Least Squares
    • 11
    • PDF
    Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks
    • 134
    • PDF
    Train faster, generalize better: Stability of stochastic gradient descent
    • 494
    • PDF
    Implicit Regularization in Deep Matrix Factorization
    • 84
    • PDF
    The Implicit Bias of Gradient Descent on Separable Data
    • 292
    • PDF
    Gradient descent optimizes over-parameterized deep ReLU networks
    • 231
    • PDF
    Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
    • 240
    • PDF
    Implicit Regularization in Deep Learning May Not Be Explainable by Norms
    • 7
    • PDF
    Connecting Optimization and Regularization Paths
    • 23
    • PDF