• Corpus ID: 605683

Effect of Batch Learning in Multilayer Neural Networks

@inproceedings{Fukumizu1998EffectOB,
  title={Effect of Batch Learning in Multilayer Neural Networks},
  author={Kenji Fukumizu},
  booktitle={International Conference on Neural Information Processing},
  year={1998}
}
  • K. Fukumizu
  • Published in
    International Conference on…
    1998
  • Computer Science
This paper discusses batch gradient descent learning in mul-tilayer networks with a large number of statistical training data. We emphasize on the diierence between regular cases, where the prepared model has the same size as the true function , and overrealizable cases, where the model has surplus hidden units to realize the true function. First, experimental study on multilayer perceptrons and linear neural networks (LNN) shows that batch learning induces strong overtrain-ing on both models… 

Dynamics of Batch Learning in Multilayer Neural Networks

We discuss the dynamics of batch learning of multilayer neural networks in the asymptotic limit, where the number of trining data is much larger than the number of parameters, emphasizing on the

Layer Dynamics of Linearised Neural Nets

This work expands and derive properties of learning dynamics respected by general multi-layer linear neural nets and shows how nonlinearity breaks down the growth symmetry observed in liner neural nets.

Neural Networks as Kernel Learners: The Silent Alignment Effect

In general, it is found that the kernel develops a low-rank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the final network’s tangent kernel.

Minnorm training: an algorithm for training overcomplete deep neural networks

This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data.

Minnorm training: an algorithm for training over-parameterized deep neural networks

This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data.

Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks

This work revisits the over-parametrized deep linear network model and reveals that, when the hidden layers are wide enough, the convergence rate of this model’s parameters is exponentially faster along the directions of the larger principal components of the data, at a rate governed by the corresponding singular values.

On the information bottleneck theory of deep learning

This work studies the information bottleneck (IB) theory of deep learning, and finds that there is no evident causal connection between compression and generalization: networks that do not compress are still capable of generalization, and vice versa.

Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

This work revisits the over-parametrized deep linear network model and shows how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently in earlier stages of learning.

Supplementary Information : A mathematical theory of semantic development in deep neural networks

This work states that when a random subset of features is observed and error-corrective learning is applied only to the observed features, then if learning is gradual the average update will be equivalent to that from observing all features, up to a scale factor which can be absorbed into the learning rate.

References

SHOWING 1-8 OF 8 REFERENCES

Learning in linear neural networks: a survey

Most of the known results on linear networks, including backpropagation learning and the structure of the error function landscape, the temporal evolution of generalization, and unsupervised learning algorithms and their properties are surveyed.

A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network

Special Statistical Properties of Neural Network Learning

Experimental results reveals that iterative learning of a neural network shows eminent overtraining and better generalization in the middle, while essentialrences between feed-forward neural network models and conventional linear statistical models are elucidated.

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

It is shown that the asymptotic gain in the generalization error is small if the authors perform early stopping, even if they have access to the optimal stopping time.

Universal approximation bounds for superpositions of a sigmoidal function

  • A. Barron
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1993
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.

Global analysis of Oja's flow for neural networks

The solution of Oja's equation is exponentially convergent to an equilibrium from any initial value and the necessary and sufficient conditions are given on the initial value for the solution to converge to a dominant eigenspace of the associated autocorrelation matrix.

Simplified neuron model as a principal component analyzer

  • E. Oja
  • Biology
    Journal of mathematical biology
  • 1982
A simple linear neuron model with constrained Hebbian-type synaptic modification is analyzed and a new class of unconstrained learning rules is derived. It is shown that the model neuron tends to

Statistical the- ory of overtraining { is cross-validation asymptotically e ective?," Advances in Neural Information Processing Systems 8, pp.176{182

  • 1996