From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks

@article{Yoshino2020FromCT,
  title={From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks},
  author={Hajime Yoshino},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.09918}
}
  • H. Yoshino
  • Published 22 October 2019
  • Computer Science
  • ArXiv
We develop a statistical mechanical approach based on the replica method to study the design space of deep and wide neural networks constrained to meet a large number of training data. Specifically, we analyze the configuration space of the synaptic weights and neurons in the hidden layers in a simple feed-forward perceptron network for two scenarios: a setting with random inputs/outputs and a teacher-student setting. By increasing the strength of constraints,~i.e. increasing the number of… 

Data-driven effective model shows a liquid-like deep learning

TLDR
This work proposes a statistical mechanics framework by directly building a least structured model of the high-dimensional weight space, considering realistic structured data, stochastic gradient descent algorithms, and the computational depth of the network parametrized by weight parameters.

Sharp Asymptotics of Self-training with Linear Classifier

TLDR
This study develops a novel theoretical framework for sharply characterizing the generalization abilities of the models trained by ST using the non-rigorous replica method of statistical physics.

Sharp Asymptotics of Self-training with Linear Classifier

TLDR
This study develops a novel theoretical framework for sharply characterizing the generalization abilities of the models trained by ST using the non-rigorous replica method of statistical physics.

Proliferation of non-linear excitations in the piecewise-linear perceptron

We investigate the properties of local minima of the energy landscape of a continuous non-convex optimization problem, the spherical perceptron with piecewise linear cost function and show that they

Fundamental problems in Statistical Physics XIV: Lecture on Machine Learning

  • A. Decelle
  • Computer Science
    Physica A: Statistical Mechanics and its Applications
  • 2022

References

SHOWING 1-10 OF 82 REFERENCES

Comparing dynamics: deep neural networks versus glassy systems

TLDR
Numerically the training dynamics of deep neural networks (DNN) are analyzed by using methods developed in statistical physics of glassy systems to suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions.

Opening the Black Box of Deep Neural Networks via Information

TLDR
This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.

The jamming transition as a paradigm to understand the loss landscape of deep neural networks

TLDR
It is argued that in fully connected deep networks a phase transition delimits the over- and underparametrized regimes where fitting can or cannot be achieved, and observed that the ability of fully connected networks to fit random data is independent of their depth, an independence that appears to also hold for real data.

Exponential expressivity in deep neural networks through transient chaos

TLDR
The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

Understanding deep learning requires rethinking generalization

TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Weight space structure and internal representations: A direct approach to learning and generalization in multilayer neural networks.

TLDR
The results are exact in the limit of a large number of hidden units, showing that MLN are a class of exactly solvable models with a simple interpretation of replica symmetry breaking.

The space of interactions in neural network models

The typical fraction of the space of interactions between each pair of N Ising spins which solve the problem of storing a given set of p random patterns as N-bit spin configurations is considered.

Jamming in Multilayer Supervised Learning Models.

TLDR
This Letter investigates multilayer neural networks (MLNN) learning random associations as models for CCSP that could potentially define different jamming universality classes.

Optimal storage properties of neural network models

The authors calculate the number, p= alpha N of random N-bit patterns that an optimal neural network can store allowing a given fraction f of bit errors and with the condition that each right bit is

The simplest model of jamming

TLDR
It is shown that isostaticity is not a sufficient condition for singular force and gap distributions, and universality is hypothesized for a large class of non-convex constrained satisfaction problems with continuous variables.
...