# Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

@article{Suzuki2019AdaptivityOD, title={Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality}, author={Taiji Suzuki}, journal={ArXiv}, year={2019}, volume={abs/1810.08033} }

Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing, which indicates superior flexibility and adaptivity of deep learning. To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness. The Besov space is a considerably general function space including the Holder space and…

## 135 Citations

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

- Computer ScienceNeurIPS
- 2021

The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

- Computer ScienceNeural Networks
- 2020

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space

- Computer ScienceArXiv
- 2020

The improvement based on adaptivity is remarkable when the region upon which the target function has less smoothness is small and the dimension is large, and the superiority to linear estimators is shown with respect to the convergence rate of the estimation error.

Fast generalization error bound of deep learning without scale invariance of activation functions

- Computer ScienceNeural Networks
- 2020

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

- Computer ScienceICML
- 2021

This work establishes theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification, and proves that if the network architecture is properly chosen, ConvResNets can approximate Besov functions on manifolds with arbitrary accuracy.

Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?

- Computer ScienceArXiv
- 2022

It is established that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes, which gets exponentially closer to minimax optimal as the NN gets deeper.

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

- Computer ScienceICLR
- 2021

It is shown that any linear estimator can be outperformed by deep learning in a sense of the minimax optimal rate especially for a high dimension setting and so-called fast learning rate is obtained.

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

- Computer ScienceArXiv
- 2022

This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks and investigates the influence of network structures on the generalization error of the neural network estimator and proposes a general suggestion on the choice of network structure to maximize the learning efficiency quantitatively.

Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces

- Computer Science, Mathematics
- 2020

It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.

Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary

- Computer Science
- 2022

It is shown that DNN classiﬁers can adapt to low-dimensional data structures and circumvent the “curse of dimensionality” in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension.

## References

SHOWING 1-10 OF 64 REFERENCES

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

- Computer ScienceNeural Networks
- 2018

Deep Neural Networks Learn Non-Smooth Functions Effectively

- Computer ScienceAISTATS
- 2019

It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.

Minimax estimation via wavelet shrinkage

- Mathematics, Computer Science
- 1998

A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.

Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning

- Computer ScienceNIPS
- 2016

It is shown that the alternating minimization method achieves linear convergence as an optimization algorithm and that the generalization error of the resultant estimator yields the minimax optimality.

Gaussian process nonparametric tensor estimator and its minimax optimality

- Computer ScienceICML
- 2016

A non-parametric Bayesian method based on the Gaussian process method is proposed for multi-task learning and it is shown that this method significantly outperforms existing methods through numerical experiments on real-world data sets.

Neural Network with Unbounded Activations is Universal Approximator

- Computer Science, MathematicsArXiv
- 2015

Exponential expressivity in deep neural networks through transient chaos

- Computer ScienceNIPS
- 2016

The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.

Nonparametric regression using deep neural networks with ReLU activation function

- Computer ScienceArXiv
- 2017

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

- Computer ScienceJ. Mach. Learn. Res.
- 2012

This paper considers the case where each univariate component function fj* lies in a reproducing kernel Hilbert space (RKHS), and analyzes a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization, obtaining optimal minimax rates for many interesting classes of sparse additive models.

On the Number of Linear Regions of Deep Neural Networks

- Computer ScienceNIPS
- 2014

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep…