• Corpus ID: 221655815

Complexity Measures for Neural Networks with General Activation Functions Using Path-based Norms

@article{Li2020ComplexityMF,
  title={Complexity Measures for Neural Networks with General Activation Functions Using Path-based Norms},
  author={Zhong Li and Chao Ma and Lei Wu},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.06132}
}
A simple approach is proposed to obtain complexity controls for neural networks with general activation functions. The approach is motivated by approximating the general activation functions with one-dimensional ReLU networks, which reduces the problem to the complexity controls of ReLU networks. Specifically, we consider two-layer networks and deep residual networks, for which path-based norms are derived to control complexities. We also provide preliminary analyses of the function spaces… 

Tables from this paper

Generalization Error Bounds for Deep Neural Networks Trained by SGD

Generalization error bounds for deep neural networks trained by stochastic gradient descent are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms, and work for a wide range of network architectures.

Characterization of the Variation Spaces Corresponding to Shallow Neural Networks

We consider the variation space corresponding to a dictionary of functions in $L^2(\Omega)$ and present the basic theory of approximation in these spaces. Specifically, we compare the definition

Approximation results for Gradient Descent trained Shallow Neural Networks in 1d

This paper provides an approximation result for shallow networks in 1 d with non-convex weight optimization by gradient descent with some form of redundancy reappears as a loss in approximation rate compared to best possible rates.

Approximation of Functionals by Neural Network without Curse of Dimensionality

A neural network is established to approximate functionals, which are maps from infinite dimensional spaces to in-between dimensional spaces, to create a Barron space of functionals.

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

It is proved that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as d and √ d respectively, and it is shown that the weight decayRegularizer grows exponentially in d if the label 1 is imposed on a ball of radius ε rather than just at the origin.

Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine

Some observations on partial differential equations in Barron and multi-layer spaces

We use explicit representation formulas to show that solutions to certain partial differential equations can be represented efficiently using artificial neural networks, even in high dimension.

Some observations on high-dimensional partial differential equations with Barron data

It is shown that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces, and these solutions can be represented electronically using artificial neural networks, even in high dimension.

The Barron Space and the Flow-Induced Function Spaces for Neural Network Models

The Barron space is defined and it is shown that it is the right space for two-layer neural network models in the sense that optimal direct and inverse approximation theorems hold for functions in the Barron space.

References

SHOWING 1-10 OF 25 REFERENCES

Global Capacity Measures for Deep ReLU Networks via Path Sampling

This work shows that for a large class of networks possessing a positive homogeneity property, similar bounds may be obtained instead in terms of the norm of the product of weights, which can be converted to generalization bounds for multi-class classification that are comparable to, and in certain cases improve upon, existing results in the literature.

A PRIORI ESTIMATES OF THE POPULATION RISK FOR TWO-LAYER NEURAL NETWORKS

New estimates for the population risk are established for two-layer neural networks and are a priori in nature in the sense that the bounds depend only on some norms of the underlying functions to be fitted, not the parameters in the model, in contrast with most existing results which are a posteriora in nature.

A Priori Estimates of the Population Risk for Residual Networks

Optimal a priori estimates are derived for the population risk, also known as the generalization error, of a regularized residual network model, which treats the skip connections and the nonlinearities differently so that paths with more non linearities are regularized by larger weights.

On the Generalization Properties of Minimum-norm Solutions for Over-parameterized Neural Network Models

It is proved that for all three models, the generalization error for the minimum-norm solution is comparable to the Monte Carlo rate, up to some logarithmic terms, as long as the models are sufficiently over-parametrized.

Size-Independent Sample Complexity of Neural Networks

The sample complexity of learning neural networks is studied by providing new bounds on their Rademacher complexity, assuming norm constraints on the parameter matrix of each layer, and under some additional assumptions, these bounds are fully independent of the network size.

Spectrally-normalized margin bounds for neural networks

This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.

Barron Spaces and the Compositional Function Spaces for Neural Network Models

This paper defines Barron space and shows that it is the right space for two-layer neural network models in the sense that optimal direct and inverse approximation theorems hold for functions in the Barron space.

Mean Field Analysis of Deep Neural Networks

This work rigorously establishes the limiting behavior of the multilayer neural network output and shows that, under suitable assumptions on the activation functions and the behavior for large times, the limit neural network recovers a global minimum.

Approximation and Estimation for High-Dimensional Deep Learning Networks

The heart of the analysis is the development of a sampling strategy that demonstrates the accuracy of a sparse covering of deep ramp networks, and lower bounds show that the identified risk is close to being optimal.

Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks

This work uncovers a phenomenon in which the behavior of these complex networks -- under suitable scalings and stochastic gradient descent dynamics -- becomes independent of the number of neurons as this number grows sufficiently large.