• Corpus ID: 54434925

Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

@article{Li2018MeasureML,
  title={Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks},
  author={Shuai Li},
  journal={ArXiv},
  year={2018},
  volume={abs/1811.12783}
}
  • Shuai Li
  • Published 30 November 2018
  • Computer Science
  • ArXiv
We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the… 

Figures from this paper

Towards a General Model of Knowledge for Facial Analysis by Multi-Source Transfer Learning

A lightweight student model mimicking the collection of the fused existing models is obtained, achieving results on par with state-of-the-art methods on 15 facial analysis tasks (and domains), at an affordable training cost.

Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels. (Deep learning for multimodal and temporal contents analysis)

Une representation neuronale plus generale est obtenue a partir d’un modele unique, qui rassemble the connaissance contenue dans les modeles pre-entraines et conduit a des performances a l'etat de l'art sur une variete of tâches d'analyse de visages.

References

SHOWING 1-10 OF 135 REFERENCES

Geometry of Neural Network Loss Surfaces via Random Matrix Theory

An analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy are introduced.

Open Problem: The landscape of the loss surfaces of multilayer networks

The question is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.

Theory IIIb: Generalization in Deep Networks

It is proved that the weight matrix at each layer of a deep network converges to a minimum norm solution up to a scale factor (in the separable case) and the analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for ranking the generalization performance of different zero minimizers of the empirical loss.

Expectation propagation: a probabilistic view of Deep Feed Forward Networks

The energy-based approach naturally explains several known results and heuristics, providing a solid theoretical framework and new instruments for a systematic development of FFN and finds that ESP allows for faster training and more consistent performances over a wide range of network architectures.

The Loss Surfaces of Multilayer Networks

It is proved that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.

Why Does Deep and Cheap Learning Work So Well?

It is argued that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine learning, a deep neural network can be more efficient than a shallow one.

Intriguing properties of neural networks

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

Information geometry of the EM and em algorithms for neural networks

  • S. Amari
  • Computer Science
    Neural Networks
  • 1995

Singularities Affect Dynamics of Learning in Neuromanifolds

An overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures is given and the natural gradient method is shown to perform well because it takes the singular geometrical structure into account.

Understanding the Loss Surface of Neural Networks for Binary Classification

This work focuses on the training performance of single-layered neural networks for binary classification, and provides conditions under which the training error is zero at all local minima of a smooth hinge loss function.
...