Nonparametric regression using deep neural networks with ReLU activation function

@article{SchmidtHieber2020NonparametricRU,
  title={Nonparametric regression using deep neural networks with ReLU activation function},
  author={Johannes Schmidt-Hieber},
  journal={ArXiv},
  year={2020},
  volume={abs/1708.06633}
}
Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network… 

Figures from this paper

Robust Nonparametric Regression with Deep Neural Networks
TLDR
Simulation studies demonstrate that the robust methods can significantly outperform the least squares method when the errors have heavy-tailed distributions and illustrate that the choice of loss function is important in the context of deep nonparametric regression.
ROBUST NONPARAMETRIC REGRESSION WITH DEEP NEURAL NETWORKS
  • Uohao, Hen, Uang
  • Computer Science, Mathematics
  • 2021
TLDR
Simulation studies demonstrate that the robust methods can significantly outperform the least squares method when the errors have heavy-tailed distributions and illustrate that the choice of loss function is important in the context of deep nonparametric regression.
On the rate of convergence of fully connected very deep neural network regression estimates
TLDR
This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.
ON THE RATE OF CONVERGENCE OF FULLY CONNECTED DEEP NEURAL NETWORK REGRESSION ESTIMATES BY MICHAEL KOHLER*
TLDR
This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.
How do noise tails impact on deep ReLU networks?
TLDR
How the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions with hierarchical composition structure when both the adaptive Huber loss and deep ReLU neural networks are used is unveiled.
Measurement error models: from nonparametric methods to deep neural networks
TLDR
This paper proposes an efficient neural network design for estimating measurement error models, which utilizes recent advances in variational inference for deep neural networks, such as the importance weight autoencoder, doubly reparametrized gradient estimator, and non-linear independent components estimation.
Statistical Learning using Sparse Deep Neural Networks in Empirical Risk Minimization
TLDR
It is derived that the SDRN estimator can achieve the same minimax rate of estimation as one-dimensional nonparametric regression when the dimension of the features is fixed, and the estimator has a suboptimal rate when thedimension grows with the sample size.
Analysis of the rate of convergence of neural network regression estimates which are easy to implement.
TLDR
This article introduces a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement and which achieves the one-dimensional rate of convergence.
...
...

References

SHOWING 1-10 OF 91 REFERENCES
Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science
TLDR
A method to design neural networks as sparse scale-free networks, which leads to a reduction in computational time required for training and inference, which has the potential to enable artificial neural networks to scale up beyond what is currently possible.
Approximation and Estimation for High-Dimensional Deep Learning Networks
TLDR
The heart of the analysis is the development of a sampling strategy that demonstrates the accuracy of a sparse covering of deep ramp networks, and lower bounds show that the identified risk is close to being optimal.
Breaking the Curse of Dimensionality with Convex Neural Networks
  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
TLDR
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
Optimal approximation of continuous functions by very deep ReLU networks
TLDR
It is proved that constant-width fully-connected networks of depth $L\sim W$ provide the fastest possible approximation rate $\|f-\widetilde f\|_\infty = O(\omega_f(O(W^{-2/\nu})))$ that cannot be achieved with less deep networks.
Why Deep Neural Networks for Function Approximation?
TLDR
It is shown that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neuron needs by a deep network for a given degree of function approximation.
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep
Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality
TLDR
It is theoretically proved that the generalization performance of deep neural networks (DNNs) is mainly determined by an intrinsic low-dimensional structure of data, and DNNs outperform other non-parametric estimators which are also adaptive to the intrinsic dimension.
Deep ReLU network approximation of functions on a manifold
TLDR
This work studies a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension, and derives statistical convergence rates for the estimator minimizing the empirical risk over all possible choices of bounded network parameters.
Convergence rates for single hidden layer feedforward networks
...
...