Deep frequency principle towards understanding why deeper learning is faster

@article{Xu2020DeepFP,
  title={Deep frequency principle towards understanding why deeper learning is faster},
  author={Zhi-Qin John Xu and Hanxu Zhou},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.14313}
}
Understanding the effect of depth in deep learning is a critical problem. In this work, we utilize the Fourier analysis to empirically provide a promising mechanism to understand why feedforward deeper learning is faster. To this end, we separate a deep neural network, trained by normal stochastic gradient descent, into two parts during analysis, i.e., a pre-condition component and a learning component, in which the output of the pre-condition one is the input of the learning one. We use a… 

Figures from this paper

Frequency Principle in Deep Learning Beyond Gradient-descent-based Training

Empirical studies show the universality of the F-Principle in the training process of DNNs with nongradient-descent-based training, and algorithms without gradient information, such as Powell’s method and Particle Swarm Optimization.

A Computable Definition of the Spectral Bias

Neural networks have a bias towards low frequency functions. This spectral bias has been the subject of several previous studies, both empirical and theoretical. Here we present a computable

Solving Multi-Dimensional Schr\"{o}dinger Equations Based on EPINNs

A novel numerical method that uses a neural network to solve the multi-dimensional static Schr¨odinger equation and other high-dimensional eigenvalue problems for multi-electron atoms and molecules is proposed.

Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

An embedding principle in depth is discovered that loss landscape of an NN “contains” all critical points of the loss landscapes for shallower NNs, which serves as a solid foundation for the further study about the role of depth for DNNs.

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

The phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studyingDeep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.

Limitation of characterizing implicit regularization by data-independent functions

This work makes an attempt to mathematically define and study the implicit regularization, and proposes two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which they provide two recipes for producing classes of onehidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions.

Overview frequency principle/spectral bias in deep learning

An overview of F-Principle is provided and some open problems for future research are proposed, which inspire the design of DNN-based algorithms in practical problems, explains experimental phenomena emerging in various scenarios, and further advances the study of deep learning from the frequency perspective.

Subspace Decomposition based DNN algorithm for elliptic type multi-scale PDEs

A subspace decomposition based DNN (dubbed SDNN) architecture for a class of multi-scale problems by combining traditional numerical analysis ideas and MscaleDNN algorithms is constructed.

Going Deeper in Frequency Convolutional Neural Network: A Theoretical Perspective

The Fourier transform theory is revisited to derive feed-forward and back-propagation frequency operations of typical network modules such as convolution, activation and pooling and extended to the Laplace transform for CNN, which can run in the real domain with more relaxed constraints.

Towards Understanding the Condensation of Neural Networks at Initial Training

This work illustrates the formation of the condensation in multi-layer fully connected NNs and shows that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where “multiplicity” indicates the multiple roots of activation function at origin.

References

SHOWING 1-10 OF 36 REFERENCES

Training behavior of deep neural network in frequency domain

For both real and synthetic datasets, it is empirically found that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones.

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

A very universal Frequency Principle (F-Principle) --- DNNs often fit target functions from low to high frequencies --- is demonstrated on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16.

Understanding training and generalization in deep learning by Fourier analysis

This work studies DNN training by Fourier analysis to explain why Deep Neural Networks often achieve remarkably low generalization error and suggests small initialization leads to good generalization ability of DNN while preserving the DNN's ability to fit any function.

Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks

An effective model of linear F-Principle (LFP) dynamics is proposed which accurately predicts the learning results of two-layer ReLU neural networks (NNs) of large widths and is rationalized by a linearized mean field residual dynamics of NNs.

Towards Understanding the Spectral Bias of Deep Learning

It is proved that the training process of neural networks can be decomposed along different directions defined by the eigenfunctions of the neural tangent kernel, where each direction has its own convergence rate and the rate is determined by the corresponding eigenvalue.

Theory of the Frequency Principle for General Deep Neural Networks

This work rigorously investigate the F-Principle for the training dynamics of a general DNN at three stages: initial stage, intermediate stage, and final stage and results are general in the sense that they work for multilayer networks with general activation functions, population densities of data, and a large class of loss functions.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Deep learning generalizes because the parameter-function map is biased towards simple functions

This paper argues that the parameter-function map of many DNNs should be exponentially biased towards simple functions, and provides clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST.

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

A general convergence analysis of BCGD is established and the optimal learning rate is found, which results in the fastest decrease in the loss, which is found that the use of deep networks could drastically accelerate convergence when it is compared to those of a depth 1 network, even when the computational cost is considered.

On the Spectral Bias of Deep Neural Networks

It is shown that deep networks with finite weights (or trained for finite number of steps) are inherently biased towards representing smooth functions over the input space, and all samples classified by a network to belong to a certain class are connected by a path such that the prediction of the network along that path does not change.