Corpus ID: 219635962

Non-convergence of stochastic gradient descent in the training of deep neural networks

@article{Cheridito2020NonconvergenceOS,
  title={Non-convergence of stochastic gradient descent in the training of deep neural networks},
  author={Patrick Cheridito and Arnulf Jentzen and Florian Rossmannek},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.07075}
}
  • Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the size of the training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown… CONTINUE READING

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 53 REFERENCES

    Trainability and Data-dependent Initialization of Overparameterized

    • Y. Shin, G. E. Karniadakis
    • ReLU Neural Networks
    • 2019
    VIEW 3 EXCERPTS
    HIGHLY INFLUENTIAL

    Deep Learning without Poor Local Minima

    Efficient BackProp

    VIEW 1 EXCERPT