# Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks

@article{Li2018MeasureML, title={Measure, Manifold, Learning, and Optimization: A Theory Of Neural Networks}, author={Shuai Li}, journal={ArXiv}, year={2018}, volume={abs/1811.12783} }

We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows.
* Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world.
* We show that NNs are special cases of S-System when the…

## 2 Citations

### Towards a General Model of Knowledge for Facial Analysis by Multi-Source Transfer Learning

- Computer ScienceArXiv
- 2019

A lightweight student model mimicking the collection of the fused existing models is obtained, achieving results on par with state-of-the-art methods on 15 facial analysis tasks (and domains), at an affordable training cost.

### Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels. (Deep learning for multimodal and temporal contents analysis)

- Philosophy, Computer Science
- 2019

Une representation neuronale plus generale est obtenue a partir d’un modele unique, qui rassemble the connaissance contenue dans les modeles pre-entraines et conduit a des performances a l'etat de l'art sur une variete of tâches d'analyse de visages.

## References

SHOWING 1-10 OF 135 REFERENCES

### Geometry of Neural Network Loss Surfaces via Random Matrix Theory

- Computer ScienceICML
- 2017

An analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy are introduced.

### Open Problem: The landscape of the loss surfaces of multilayer networks

- Computer ScienceCOLT
- 2015

The question is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.

### Theory IIIb: Generalization in Deep Networks

- Computer ScienceArXiv
- 2018

It is proved that the weight matrix at each layer of a deep network converges to a minimum norm solution up to a scale factor (in the separable case) and the analysis of the dynamical system corresponding to gradient descent of a multilayer network suggests a simple criterion for ranking the generalization performance of different zero minimizers of the empirical loss.

### Expectation propagation: a probabilistic view of Deep Feed Forward Networks

- Computer ScienceArXiv
- 2018

The energy-based approach naturally explains several known results and heuristics, providing a solid theoretical framework and new instruments for a systematic development of FFN and finds that ESP allows for faster training and more consistent performances over a wide range of network architectures.

### The Loss Surfaces of Multilayer Networks

- Computer ScienceAISTATS
- 2015

It is proved that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.

### Why Does Deep and Cheap Learning Work So Well?

- Computer ScienceArXiv
- 2016

It is argued that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine learning, a deep neural network can be more efficient than a shallow one.

### Intriguing properties of neural networks

- Computer ScienceICLR
- 2014

It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

### Information geometry of the EM and em algorithms for neural networks

- Computer ScienceNeural Networks
- 1995

### Singularities Affect Dynamics of Learning in Neuromanifolds

- MathematicsNeural Comput.
- 2006

An overview of the phenomena caused by the singularities of statistical manifolds related to multilayer perceptrons and gaussian mixtures is given and the natural gradient method is shown to perform well because it takes the singular geometrical structure into account.

### Understanding the Loss Surface of Neural Networks for Binary Classification

- Computer ScienceICML
- 2018

This work focuses on the training performance of single-layered neural networks for binary classification, and provides conditions under which the training error is zero at all local minima of a smooth hinge loss function.