# Effect of Batch Learning in Multilayer Neural Networks

@inproceedings{Fukumizu1998EffectOB, title={Effect of Batch Learning in Multilayer Neural Networks}, author={Kenji Fukumizu}, booktitle={International Conference on Neural Information Processing}, year={1998} }

This paper discusses batch gradient descent learning in mul-tilayer networks with a large number of statistical training data. We emphasize on the diierence between regular cases, where the prepared model has the same size as the true function , and overrealizable cases, where the model has surplus hidden units to realize the true function. First, experimental study on multilayer perceptrons and linear neural networks (LNN) shows that batch learning induces strong overtrain-ing on both models…

No Paper Link Available

## 21 Citations

### Dynamics of Batch Learning in Multilayer Neural Networks

- Computer Science
- 1998

We discuss the dynamics of batch learning of multilayer neural networks in the asymptotic limit, where the number of trining data is much larger than the number of parameters, emphasizing on the…

### High-dimensional dynamics of generalization error in neural networks

- Computer ScienceNeural Networks
- 2020

### Layer Dynamics of Linearised Neural Nets

- Computer ScienceArXiv
- 2019

This work expands and derive properties of learning dynamics respected by general multi-layer linear neural nets and shows how nonlinearity breaks down the growth symmetry observed in liner neural nets.

### Neural Networks as Kernel Learners: The Silent Alignment Effect

- Computer ScienceICLR
- 2022

In general, it is found that the kernel develops a low-rank contribution in the early phase of training, and then evolves in overall scale, yielding a function equivalent to a kernel regression solution with the final network’s tangent kernel.

### Minnorm training: an algorithm for training overcomplete deep neural networks

- Computer Science
- 2018

This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data.

### Minnorm training: an algorithm for training over-parameterized deep neural networks

- Computer ScienceArXiv
- 2018

This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data.

### Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks

- Computer Science
- 2021

This work revisits the over-parametrized deep linear network model and reveals that, when the hidden layers are wide enough, the convergence rate of this model’s parameters is exponentially faster along the directions of the larger principal components of the data, at a rate governed by the corresponding singular values.

### On the information bottleneck theory of deep learning

- Computer ScienceICLR
- 2018

This work studies the information bottleneck (IB) theory of deep learning, and finds that there is no evident causal connection between compression and generalization: networks that do not compress are still capable of generalization, and vice versa.

### Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias

- Computer ScienceArXiv
- 2021

This work revisits the over-parametrized deep linear network model and shows how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently in earlier stages of learning.

### Supplementary Information : A mathematical theory of semantic development in deep neural networks

- Computer Science
- 2019

This work states that when a random subset of features is observed and error-corrective learning is applied only to the observed features, then if learning is gradual the average update will be equivalent to that from observing all features, up to a scale factor which can be absorbed into the learning rate.

## References

SHOWING 1-8 OF 8 REFERENCES

### Learning in linear neural networks: a survey

- Computer ScienceIEEE Trans. Neural Networks
- 1995

Most of the known results on linear networks, including backpropagation learning and the structure of the error function landscape, the temporal evolution of generalization, and unsupervised learning algorithms and their properties are surveyed.

### A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network

- Computer ScienceNeural Networks
- 1996

### Special Statistical Properties of Neural Network Learning

- Computer Science
- 1997

Experimental results reveals that iterative learning of a neural network shows eminent overtraining and better generalization in the middle, while essentialrences between feed-forward neural network models and conventional linear statistical models are elucidated.

### Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

- Computer ScienceNIPS
- 1995

It is shown that the asymptotic gain in the generalization error is small if the authors perform early stopping, even if they have access to the optimal stopping time.

### Universal approximation bounds for superpositions of a sigmoidal function

- Computer ScienceIEEE Trans. Inf. Theory
- 1993

The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.

### Global analysis of Oja's flow for neural networks

- MathematicsIEEE Trans. Neural Networks
- 1994

The solution of Oja's equation is exponentially convergent to an equilibrium from any initial value and the necessary and sufficient conditions are given on the initial value for the solution to converge to a dominant eigenspace of the associated autocorrelation matrix.

### Simplified neuron model as a principal component analyzer

- BiologyJournal of mathematical biology
- 1982

A simple linear neuron model with constrained Hebbian-type synaptic modification is analyzed and a new class of unconstrained learning rules is derived. It is shown that the model neuron tends to…

### Statistical the- ory of overtraining { is cross-validation asymptotically e ective?," Advances in Neural Information Processing Systems 8, pp.176{182

- 1996