# Non-convergence of stochastic gradient descent in the training of deep neural networks

@article{Cheridito2020NonconvergenceOS, title={Non-convergence of stochastic gradient descent in the training of deep neural networks}, author={Patrick Cheridito and Arnulf Jentzen and Florian Rossmannek}, journal={ArXiv}, year={2020}, volume={abs/2006.07075} }

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the size of the training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown… CONTINUE READING

#### Topics from this paper.

#### Citations

##### Publications citing this paper.

#### References

##### Publications referenced by this paper.

SHOWING 1-10 OF 53 REFERENCES

## Trainability and Data-dependent Initialization of Overparameterized

VIEW 3 EXCERPTS

HIGHLY INFLUENTIAL

## Efficient BackProp

VIEW 1 EXCERPT

## Failures of Gradient-Based Deep Learning

VIEW 1 EXCERPT