Corpus ID: 173991015

A mean-field limit for certain deep neural networks

@article{Araujo2019AML,
  title={A mean-field limit for certain deep neural networks},
  author={Dyego Ara'ujo and Roberto Imbuzeiro Oliveira and Daniel Yukimura},
  journal={arXiv: Statistics Theory},
  year={2019}
}
Understanding deep neural networks (DNNs) is a key challenge in the theory of machine learning, with potential applications to the many fields where DNNs have been successfully used. This article presents a scaling limit for a DNN being trained by stochastic gradient descent. Our networks have a fixed (but arbitrary) number $L\geq 2$ of inner layers; $N\gg 1$ neurons per layer; full connections between layers; and fixed weights (or "random features" that are not trained) near the input and… Expand

Figures from this paper

OVER-PARAMETERIZED DEEP NEURAL NETWORKS
  • 2020
This paper proposes a new mean-field framework for over-parameterized deep neural networks (DNNs), which can be used to analyze neural network training. In this framework, a DNN is represented byExpand
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks
TLDR
This analysis leads to the first global convergence proof for over-parameterized neural network training with more than $3$ layers in the mean-field regime, and leads to a simpler representation of DNNs, for which the training objective can be reformulated as a convex optimization problem via suitable re- parameterization. Expand
Mean Field Analysis of Deep Neural Networks
We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorouslyExpand
Predicting the outputs of finite deep neural networks trained with noisy gradients
TLDR
This work considers a DNN training protocol, involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian stochastic process, whose deviation from a GP is controlled by the finite width. Expand
A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks
TLDR
A mathematically rigorous framework for multilayer neural networks in the mean field regime with a new idea of a non-evolving probability space that allows to embed neural networks of arbitrary widths and proves a global convergence guarantee for two-layer and three-layer networks. Expand
Global Convergence of Three-layer Neural Networks in the Mean Field Regime
TLDR
This work develops a rigorous framework to establish the mean field limit of three-layer networks under stochastic gradient descent training and proposes the idea of a neuronal embedding, which comprises of a fixed probability space that encapsulates neural networks of arbitrary sizes. Expand
An analytic theory of shallow networks dynamics for hinge loss classification
TLDR
This paper study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task, and shows that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. Expand
Mathematical Models of Overparameterized Neural Networks
TLDR
The analysis of two-layer NNs is focused on and the key mathematical models, with their algorithmic implications, are explained and the challenges in understanding deep NNs are discussed. Expand
Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
TLDR
An infinite hierarchy of ordinary differential equations is derived, the neural tangent hierarchy (NTH) which captures the gradient descent dynamic of the deep neural network, and it is proved that the truncated hierarchy of NTH approximates theynamic of the NTK up to arbitrary precision. Expand
Feature Learning in Infinite-Width Neural Networks
TLDR
It is shown that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT, and any such infinite- width limit can be computed using the Tensor Programs technique. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 34 REFERENCES
Mean Field Analysis of Deep Neural Networks
We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorouslyExpand
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit
TLDR
This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions. Expand
Mean Field Analysis of Neural Networks
Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas areExpand
A mean field view of the landscape of two-layer neural networks
TLDR
A compact description of the SGD dynamics is derived in terms of a limiting partial differential equation that allows for “averaging out” some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD. Expand
Scaling description of generalization with number of parameters in deep learning
TLDR
This work relies on the so-called Neural Tangent Kernel, which connects large neural nets to kernel methods, to show that the initialization causes finite-size random fluctuations that affect the generalization error of neural networks. Expand
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
TLDR
Over-parameterization and random initialization jointly restrict every weight vector to be close to its initialization for all iterations, which allows a strong convexity-like property to show that gradient descent converges at a global linear rate to the global optimum. Expand
Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error
TLDR
A Law of Large Numbers and a Central Limit Theorem for the empirical distribution are established, which together show that the approximation error of the network universally scales as O(n-1) and the scale and nature of the noise introduced by stochastic gradient descent are quantified. Expand
On Lazy Training in Differentiable Programming
TLDR
This work shows that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels. Expand
Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
TLDR
This work uncovers a phenomenon in which the behavior of these complex networks -- under suitable scalings and stochastic gradient descent dynamics -- becomes independent of the number of neurons as this number grows sufficiently large. Expand
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand
...
1
2
3
4
...