• Corpus ID: 237571381

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

@article{Parhi2021NearMinimaxOE,
  title={Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks},
  author={Rahul Parhi and Robert D. Nowak},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.08844}
}
We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating… 

Figures from this paper

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?
We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN’s ability to adaptively estimate functions with heterogeneous smoothness — a

References

SHOWING 1-10 OF 44 REFERENCES
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
TLDR
This paper characterize the norm required to realize a function as a single hidden-layer ReLU network with an unbounded number of units, but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.
Breaking the Curse of Dimensionality with Convex Neural Networks
  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
TLDR
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural Networks
TLDR
The notion of a smoothly parameterized dictionary is introduced and upper bounds on the non-linear approximation rates, metric entropy and n-widths of variation spaces corresponding to shallow neural networks with a variety of activation functions are given.
Deep Neural Networks Learn Non-Smooth Functions Effectively
TLDR
It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.
Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
TLDR
A new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones.
Minimax estimation via wavelet shrinkage
TLDR
A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.
How do infinite width bounded norm networks look in function space?
TLDR
The question of what functions can be captured by ReLU networks with an unbounded number of units, but where the overall network Euclidean norm is bounded is considered; or equivalently what is the minimal norm required to approximate a given function.
Characterization of the Variation Spaces Corresponding to Shallow Neural Networks
We consider the variation space corresponding to a dictionary of functions in $L^2(\Omega)$ and present the basic theory of approximation in these spaces. Specifically, we compare the definition
Ridgelets: estimating with ridge functions
TLDR
In a nonparametric regression setting, this article suggests expanding noisy data into a ridgelet series and applying a scalar nonlinearity to the coefficients (damping); this is unlike existing approaches based on stepwise additions of elements.
...
1
2
3
4
5
...