• Corpus ID: 237571381

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

@article{Parhi2021NearMinimaxOE,
  title={Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks},
  author={Rahul Parhi and Robert D. Nowak},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.08844}
}
—We study the problem of estimating an unknown function from noisy data using shallow ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the squared Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean- squared error) of these neural network estimators when the data-generating function… 

Figures from this paper

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?
TLDR
It is established that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes, which gets exponentially closer to minimax optimal as the NN gets deeper.
Intrinsic dimensionality and generalization properties of the $\mathcal{R}$-norm inductive bias
TLDR
It is found that R-norm minimizing interpolants of datasets labeled by specific target functions are intrinsically multivariate functions, even when there are ridge functions that fit the data, and also that the R- norm inductive bias is not sufficient for achieving statistically optimal generalization for certain learning problems.
Intrinsic dimensionality and generalization properties of the R-norm inductive bias
TLDR
It is shown that R -norm minimizing interpolants of datasets labeled by specific target functions are intrinsically multivariate functions, even when there are ridge functions that dominate the data, and also that the R - norm inductive bias is not scientific for achieving statistically optimal generalization for certain learning problems.

References

SHOWING 1-10 OF 51 REFERENCES
Nonparametric regression using deep neural networks with ReLU activation function
TLDR
The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.
Banach Space Representer Theorems for Neural Networks and Ridge Splines
TLDR
A variational framework to understand the properties of the functions learned by neural networks fit to data and a representer theorem showing that finite-width, single-hidden layer neural networks are solutions to inverse problems with total variation-like regularization is derived.
What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory
TLDR
A new function space, which is reminiscent of classical bounded variation spaces, that captures the compositional structure associated with deep neural networks is proposed, and a representer theorem is derived showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.
Breaking the Curse of Dimensionality with Convex Neural Networks
  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
TLDR
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
Sharp Bounds on the Approximation Rates, Metric Entropy, and $n$-widths of Shallow Neural Networks
TLDR
The notion of a smoothly parameterized dictionary is introduced and upper bounds on the non-linear approximation rates, metric entropy and n-widths of variation spaces corresponding to shallow neural networks with a variety of activation functions are given.
Minimax estimation via wavelet shrinkage
TLDR
A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
TLDR
This paper characterize the norm required to realize a function as a single hidden-layer ReLU network with an unbounded number of units, but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.
Deep Neural Networks Learn Non-Smooth Functions Effectively
TLDR
It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.
Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
TLDR
A new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones.
...
...