# Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

@article{Parhi2021NearMinimaxOE, title={Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks}, author={Rahul Parhi and Robert D. Nowak}, journal={ArXiv}, year={2021}, volume={abs/2109.08844} }

We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating…

## One Citation

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

- Computer ScienceArXiv
- 2022

We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN’s ability to adaptively estimate functions with heterogeneous smoothness — a…

## References

SHOWING 1-10 OF 44 REFERENCES

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

- Computer ScienceNeural Networks
- 2020

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

- Computer ScienceICLR
- 2020

This paper characterize the norm required to realize a function as a single hidden-layer ReLU network with an unbounded number of units, but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.

Breaking the Curse of Dimensionality with Convex Neural Networks

- Computer ScienceJ. Mach. Learn. Res.
- 2017

This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural Networks

- Computer Science, Mathematics
- 2021

The notion of a smoothly parameterized dictionary is introduced and upper bounds on the non-linear approximation rates, metric entropy and n-widths of variation spaces corresponding to shallow neural networks with a variety of activation functions are given.

Deep Neural Networks Learn Non-Smooth Functions Effectively

- Computer ScienceAISTATS
- 2019

It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

- Computer ScienceICLR
- 2019

A new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones.

Minimax estimation via wavelet shrinkage

- Mathematics, Computer Science
- 1998

A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.

How do infinite width bounded norm networks look in function space?

- Computer Science, MathematicsCOLT
- 2019

The question of what functions can be captured by ReLU networks with an unbounded number of units, but where the overall network Euclidean norm is bounded is considered; or equivalently what is the minimal norm required to approximate a given function.

Characterization of the Variation Spaces Corresponding to Shallow Neural Networks

- Mathematics, Computer ScienceArXiv
- 2021

We consider the variation space corresponding to a dictionary of functions in $L^2(\Omega)$ and present the basic theory of approximation in these spaces. Specifically, we compare the definition…

Ridgelets: estimating with ridge functions

- Computer Science
- 2003

In a nonparametric regression setting, this article suggests expanding noisy data into a ridgelet series and applying a scalar nonlinearity to the coefficients (damping); this is unlike existing approaches based on stepwise additions of elements.