# The effective noise of stochastic gradient descent

@article{Mignacco2021TheEN, title={The effective noise of stochastic gradient descent}, author={Francesca Mignacco and Pierfrancesco Urbani}, journal={Journal of Statistical Mechanics: Theory and Experiment}, year={2021}, volume={2022} }

Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted according to the performance on this specific subset of examples. The mini-batch sampling procedure introduces a stochastic dynamics to the gradient descent, with a non-trivial state-dependent noise. We characterize the stochasticity of SGD and a recently…

## 6 Citations

### A decision tree model for the prediction of the stay time of ships in Brazilian ports

- Computer ScienceEngineering Applications of Artificial Intelligence
- 2023

### Dynamical Mean Field Theory of Kernel Evolution in Wide Neural Networks

- Computer Science
- 2022

A collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points are constructed, providing a reduced description of network activity through training.

### Rigorous dynamical mean field theory for stochastic gradient descent methods

- Computer ScienceArXiv
- 2022

Close-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator from observations on Gaussian data with empirical risk minimization are proved.

### The high-d landscapes paradigm: spin-glasses, and beyond

- Computer Science
- 2022

This Chapter focuses in particular on the problem of characterizing the landscape topology and geometry, discussing techniques to count and classify its stationary points and stressing connections with the statistical physics of disordered systems and with random matrix theory.

### Subaging in underparametrized deep neural networks

- Computer ScienceMachine Learning: Science and Technology
- 2022

We consider a simple classification problem to show that the dynamics of finite–width Deep Neural Networks in the underparametrized regime gives rise to effects similar to those associated with…

### Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

- Computer ScienceArXiv
- 2022

Comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory are provided, showing that each of these approximations can break down in regimes where general self- Consistent solutions still provide an accurate description.

## References

SHOWING 1-10 OF 56 REFERENCES

### Journal of Physics A: Mathematical and Theoretical 44

- 483001
- 2011

### Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

- Computer ScienceMachine Learning: Science and Technology
- 2021

Dynamical mean-field theory from statistical physics is applied to characterize analytically the full trajectories of gradient-based algorithms in their continuous-time limit, with a warm start, and for large system sizes to unveil several intriguing properties of the landscape and the algorithms.

### The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima

- Computer ScienceProceedings of the National Academy of Sciences
- 2021

By analyzing SGD-based learning dynamics together with the loss function landscape, a robust inverse relation between weight fluctuation and loss landscape flatness opposite to the fluctuation–dissipation relation in physics is discovered.

### and a at

- ChemistryThe William Makepeace Thackeray Library
- 2018

The xishacorene natural products are structurally unique apolar diterpenoids that feature a bicyclo[3.3.1] framework. These secondary metabolites likely arise from the well-studied, structurally…

### How to study a persistent active glassy system

- MathematicsJournal of physics. Condensed matter : an Institute of Physics journal
- 2021

A recently proposed scheme that allows one to study directly the dynamics in the large persistence time limit, on timescales around and well above the persistence time, is described.

### Understanding deep learning is also a job for physicists

- Physics, Computer Science
- 2020

A physics-based approach to automated learning from data by means of deep neural networks may help to bridge the gap between theoretical and practical applications.

### Theory of Simple Glasses: Exact Solutions in Infinite Dimensions

- Physics
- 2020

This pedagogical and self-contained text describes the modern mean field theory of simple structural glasses. The book begins with a thorough explanation of infinite-dimensional models in statistical…

### Poly-time universality and limitations of deep learning

- Computer ScienceArXiv
- 2020

SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities); this also gives a separation between SGD-based deep learning and statistical query algorithms.

### Force balance controls the relaxation time of the gradient descent algorithm in the satisfiable phase.

- Computer SciencePhysical review. E
- 2020

The relaxation dynamics of the single-layer perceptron with the spherical constraint is numerically studied and the estimated critical exponent of the relaxation time in the nonconvex region agrees very well with that of frictionless spherical particles, which have been studied in the context of the jamming transition of granular materials.