Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks

  title={Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks},
  author={P. Bartlett and D. Helmbold and Philip M. Long},
  journal={Neural Computation},
  • P. Bartlett, D. Helmbold, Philip M. Long
  • Published 2019
  • Computer Science, Mathematics, Medicine
  • Neural Computation
  • We analyze algorithms for approximating a function f(x)=Φx mapping ℜd to ℜd using deep linear neural networks, that is, that learn a function h parameterized by matrices Θ1,…,ΘL and defined by h(x)=ΘLΘL-1…Θ1x. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic. We provide polynomial bounds on the number of iterations for gradient descent to approximate the least-squares matrix Φ, in the case… CONTINUE READING
    61 Citations
    Gradient descent optimizes over-parameterized deep ReLU networks
    • 259
    • PDF
    A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
    • 112
    • Highly Influenced
    • PDF
    Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank
    • PDF
    On the Convergence of Deep Networks with Sample Quadratic Overparameterization
    • Highly Influenced
    On the Global Convergence of Training Deep Linear ResNets
    • 3
    • PDF
    Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
    • 18
    • PDF
    Global Convergence of Gradient Descent for Deep Linear Residual Networks
    • 4
    • Highly Influenced
    • PDF
    Asymptotic convergence rate of Dropout on shallow linear neural networks
    • PDF
    Linearly Convergent Algorithms for Learning Shallow Residual Networks


    Learning Polynomials with Neural Networks
    • 121
    • Highly Influential
    • PDF
    Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
    • 237
    • PDF
    On the Learnability of Fully-Connected Neural Networks
    • 15
    • PDF
    Recovery Guarantees for One-hidden-layer Neural Networks
    • 216
    • PDF
    SGD Learns the Conjugate Kernel Class of the Network
    • 120
    • PDF
    Convergence Analysis of Two-layer Neural Networks with ReLU Activation
    • 339
    • PDF
    Identity Matters in Deep Learning
    • 240
    • Highly Influential
    • PDF
    SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
    • 176
    • PDF
    Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
    • 1,015
    • PDF