A Principle of Least Action for the Training of Neural Networks

@article{Karkar2020APO,
  title={A Principle of Least Action for the Training of Neural Networks},
  author={Skander Karkar and Ibrahhim Ayed and Emmanuel de B'ezenac and Patrick Gallinari},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.08372}
}
Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system… 

Adaptable Hamiltonian neural networks

TLDR
This work introduces a class of HNNs capable of adaptable prediction of nonlinear physical systems, and demonstrates, using paradigmatic Hamiltonian systems, that training the HNN using time series from as few as four parameter values bestows the neural machine with the ability to predict the state of the target system in an entire parameter interval.

Turning Normalizing Flows into Monge Maps with Geodesic Gaussian Preserving Flows

Normalizing Flows (NF) are powerful likelihood-based generative models that are able to trade off between expressivity and tractability to model complex densities. A now well established research

Mapping conditional distributions for domain adaptation under generalized target shift

TLDR
A novel and general approach to align pretrained representations, which circumvents existing drawbacks and learns an optimal transport map, implemented as a NN, which maps source representations onto target ones.

References

SHOWING 1-10 OF 50 REFERENCES

On the Spectral Bias of Neural Networks

TLDR
This work shows that deep ReLU networks are biased towards low frequency functions, and studies the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.

Sensitivity and Generalization in Neural Networks: an Empirical Study

TLDR
It is found that trained neural networks are more robust to input perturbations in the vicinity of the training data manifold, as measured by the norm of the input-output Jacobian of the network, and that it correlates well with generalization.

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

TLDR
From this interpretation, a theoretical framework on stability and reversibility of deep neural networks is developed, and three reversible neural network architectures that can go arbitrarily deep in theory are derived.

On Residual Networks Learning a Perturbation from Identity

TLDR
A stopping rule is developed that can be used to decide the depth of the residual network based on the average perturbation magnitude being less than a given epsilon, and it is found that for sufficiently large residual networks, they are learning a perturbations from identity.

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

TLDR
It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.

Understanding deep learning requires rethinking generalization

TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Maximum Principle Based Algorithms for Deep Learning

TLDR
The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms using the Pontryagin's maximum principle, demonstrating that it obtains favorable initial convergence rate per-iteration, provided Hamiltonian maximization can be efficiently carried out.

To understand deep learning we need to understand kernel learning

TLDR
It is argued that progress on understanding deep learning will be difficult until more tractable "shallow" kernel methods are better understood, and a need for new theoretical ideas for understanding properties of classical kernel methods.

Reconciling modern machine-learning practice and the classical bias–variance trade-off

TLDR
This work shows how classical theory and modern practice can be reconciled within a single unified performance curve and proposes a mechanism underlying its emergence, and provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets.

Reconciling modern machine learning and the bias-variance trade-off

TLDR
A new "double descent" risk curve is exhibited that extends the traditional U-shaped bias-variance curve beyond the point of interpolation and shows that the risk of suitably chosen interpolating predictors from these models can, in fact, be decreasing as the model complexity increases, often below the risk achieved using non-interpolating models.