Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function

  title={Analysis of the rate of convergence of fully connected deep neural network regression estimates with smooth activation function},
  author={Sophie Langer},
  journal={J. Multivar. Anal.},
  • S. Langer
  • Published 12 October 2020
  • Computer Science
  • J. Multivar. Anal.

Figures from this paper

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

It is shown that in case of a suitable random initialization of the network, a suitable small stepsize of the gradient descent, and a number of gradient descent steps which is slightly larger than the reciprocal of the stepsize, the estimate is universally consistent in the sense that its expected L 2 error converges to zero for all distributions of the data where the response variable is square integrable.

Estimation of a regression function on a manifold by fully connected deep neural networks

VC dimension of partially quantized neural networks in the overparametrized regime

It is shown that HANNs can have VC dimension significantly smaller than the number of weights, while being highly expressive, and empirical risk minimization over HANNS in the overparametrized regime achieves the minimax rate for classi fication with Lipschitz posterior class probability.

Analysis of convolutional neural network image classifiers in a rotationally symmetric model

Under suitable structural and smoothness assumptions on the functional a posteriori probability, it is shown that least squares plug-in classifiers based on convolutional neural networks are able to circumvent the curse of dimensionality in binary image classi⬁cation if the authors neglect a resolution-dependent error term.

Research on improved convolutional wavelet neural network

Wavelet neural network (WNN) is implemented, which can solve the problems of BPNN and RBFNN and have better performance and the proposed wavelet-based Convolutional Neural Network (WCNN) can reduce the mean square error and the error rate of CNN, which means WCNN has better maximum precision than CWNN.

Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss

Under suitable assumptions on the smoothness and structure of the aposteriori probability it is shown that these estimates achieve a rate of convergence which is independent of the dimension of the image.

Music Genre Classification Based on Deep Learning

Experimental outcomes prove that the anticipated method can meritoriously increase the correctness of music classification and is helpful for music channel classification.

On the Rate of Convergence of a Classifier Based on a Transformer Encoder

It is shown that this Transformer classifier is able to circumvent the curse of dimensionality provided the a posteriori probability satisfies a suitable hierarchical composition model.

A Nonlinear Autoregressive Exogenous (NARX) Neural Network Model for the Prediction of Timestamp Influence on Bitcoin Value

Simulation analysis indicates that bitcoin digital currency’s performance variation is highly influenced by its transaction timestamp with the prediction accuracy of 96% and the contributions of this research lies with the fact that specific Bitcoin transaction events repeat themselves over and over again.

A Material Removal Prediction Method Based On Multi-Scale Attention Mechanism

The exact removal of material in abrasive belt grinding determines the final machining quality of the workpiece. However, it is difficult to determine the removal state of materials in actual



On the rate of convergence of fully connected very deep neural network regression estimates

This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.

Nonparametric regression using deep neural networks with ReLU activation function

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

Estimation of a Function of Low Local Dimensionality by Deep Neural Networks

It is shown that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality.

Convergence rates for single hidden layer feedforward networks

Universal approximation bounds for superpositions of a sigmoidal function

  • A. Barron
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1993
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.

Approximation and estimation bounds for artificial neural networks

  • A. Barron
  • Computer Science
    Machine Learning
  • 2004
The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.

On deep learning as a remedy for the curse of dimensionality in nonparametric regression

It is shown that least squares estimates based on multilayer feedforward neural networks are able to circumvent the curse of dimensionality in nonparametric regression.

The phase diagram of approximation rates for deep neural networks

It is proved that using both sine and ReLU activations theoretically leads to very fast, nearly exponential approximation rates, thanks to the emerging capability of the network to implement efficient lookup operations.

Neural Network Learning - Theoretical Foundations

The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.