• Corpus ID: 53015027

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

  title={Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality},
  author={Taiji Suzuki},
  • Taiji Suzuki
  • Published 27 September 2018
  • Computer Science
  • ArXiv
Deep learning has shown high performances in various types of tasks from visual recognition to natural language processing, which indicates superior flexibility and adaptivity of deep learning. To understand this phenomenon theoretically, we develop a new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness. The Besov space is a considerably general function space including the Holder space and… 

Tables from this paper

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space
The improvement based on adaptivity is remarkable when the region upon which the target function has less smoothness is small and the dimension is large, and the superiority to linear estimators is shown with respect to the convergence rate of the estimation error.
Fast generalization error bound of deep learning without scale invariance of activation functions
Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks
This work establishes theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification, and proves that if the network architecture is properly chosen, ConvResNets can approximate Besov functions on manifolds with arbitrary accuracy.
Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?
It is established that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes, which gets exponentially closer to minimax optimal as the NN gets deeper.
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
It is shown that any linear estimator can be outperformed by deep learning in a sense of the minimax optimal rate especially for a high dimension setting and so-called fast learning rate is obtained.
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces
This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks and investigates the influence of network structures on the generalization error of the neural network estimator and proposes a general suggestion on the choice of network structure to maximize the learning efficiency quantitatively.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces
It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.
Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary
It is shown that DNN classifiers can adapt to low-dimensional data structures and circumvent the “curse of dimensionality” in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension.


Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Deep Neural Networks Learn Non-Smooth Functions Effectively
It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.
Minimax estimation via wavelet shrinkage
A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.
Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning
It is shown that the alternating minimization method achieves linear convergence as an optimization algorithm and that the generalization error of the resultant estimator yields the minimax optimality.
Gaussian process nonparametric tensor estimator and its minimax optimality
A non-parametric Bayesian method based on the Gaussian process method is proposed for multi-task learning and it is shown that this method significantly outperforms existing methods through numerical experiments on real-world data sets.
Neural Network with Unbounded Activations is Universal Approximator
Exponential expressivity in deep neural networks through transient chaos
The theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides a quantitative underpinning for previously abstract notions about the geometry of deep functions.
Nonparametric regression using deep neural networks with ReLU activation function
The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.
Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
This paper considers the case where each univariate component function fj* lies in a reproducing kernel Hilbert space (RKHS), and analyzes a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization, obtaining optimal minimax rates for many interesting classes of sparse additive models.
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep