# On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

@article{Hayakawa2020OnTM, title={On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces}, author={Satoshi Hayakawa and Taiji Suzuki}, journal={Neural networks : the official journal of the International Neural Network Society}, year={2020}, volume={123}, pages={ 343-361 } }

## 23 Citations

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

- Computer ScienceICLR
- 2021

It is shown that any linear estimator can be outperformed by deep learning in a sense of the minimax optimal rate especially for a high dimension setting and so-called fast learning rate is obtained.

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

- Computer ScienceNeurIPS
- 2021

The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

Advantage of Deep Neural Networks for Estimating Functions with Singularity on Curves

- Computer Science, MathematicsArXiv
- 2020

The generalization error of a DNN estimator is derived and it is proved that its convergence rate is almost optimal, and a certain class of common models are sub-optimal, including linear estimators and other harmonic analysis methods such as wavelets and curvelets.

Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces

- Computer Science, Mathematics
- 2020

It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space

- Computer ScienceArXiv
- 2020

The improvement based on adaptivity is remarkable when the region upon which the target function has less smoothness is small and the dimension is large, and the superiority to linear estimators is shown with respect to the convergence rate of the estimation error.

Harmless Overparametrization in Two-layer Neural Networks

- Computer ScienceArXiv
- 2021

This work presents a generalization theory for overparametrized ReLU networks by incorporating an explicit regularizer based on the scaled variation norm, which is equivalent to the ridge from the angle of gradient-based optimization, but is similar to the group lasso in terms of controlling model complexity.

Fast generalization error bound of deep learning without scale invariance of activation functions

- Computer ScienceNeural Networks
- 2020

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

- Computer ScienceArXiv
- 2022

This work investigates the excess risk of two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs, and shows that the student network provably reaches a near-global optimal solution and outperforms any kernel methods estimator, including neural tangent kernel approach, random feature model, and other kernel methods, in a sense of the minimax optimal rate.

Convergence Rates of Variational Inference in Sparse Deep Learning

- Computer ScienceICML
- 2020

This paper shows that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference and highlights the connection between estimation and approximation theories via the classical bias-variance trade-off.

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

- Computer Science, MathematicsArXiv
- 2021

Light is shed on the phenomenon that neural networks seem to break the curse of dimensionality when the data-generating function belongs to the second-order Radon-domain bounded variation space and it is shown that the neural network estimators are minimax optimal up to logarithmic factors.

## References

SHOWING 1-10 OF 42 REFERENCES

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

- Computer ScienceICLR
- 2019

A new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones.

Deep Neural Networks Learn Non-Smooth Functions Effectively

- Computer ScienceAISTATS
- 2019

It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

- Computer ScienceNeural Networks
- 2018

Adaptive minimax regression estimation over sparse lq-hulls

- Computer ScienceJ. Mach. Learn. Res.
- 2014

The authors' universal aggregation strategies by model mixing achieve the optimal rates simultaneously over the full range of 0 â‰¤ q â‰¤ 1 for any Mn and without knowledge of the lq-norm of the best linear coefficients to represent the regression function.

Minimax estimation via wavelet shrinkage

- Mathematics, Computer Science
- 1998

A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.

Nonparametric regression using deep neural networks with ReLU activation function

- Computer ScienceArXiv
- 2017

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

Neural Network with Unbounded Activations is Universal Approximator

- Computer Science, MathematicsArXiv
- 2015

Deep Learning

- Computer ScienceNature
- 2015

Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

Error bounds for approximations with deep ReLU networks

- Computer ScienceNeural Networks
- 2017

Minimax estimation of linear functionals over nonconvex parameter spaces

- Mathematics
- 2004

The minimax theory for estimating linear functionals is extended to the case of a finite union of convex parameter spaces. Upper and lower bounds for the minimax risk can still be described in termsâ€¦