• Corpus ID: 246652664

Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces

  title={Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces},
  author={Masaaki Imaizumi and Kenji Fukumizu},
We develop a minimax rate analysis to describe the reason that deep neural networks (DNNs) perform better than other standard methods. For nonparametric regression problems, it is well known that many standard methods attain the minimax optimal rate of estimation errors for smooth functions, and thus, it is not straightforward to identify the theoretical advantages of DNNs. This study tries to fill this gap by considering the estimation for a class of non-smooth functions that have singularities… 
1 Citations

Figures from this paper

On the inability of Gaussian process regression to optimally learn compositional functions
We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower


Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Deep Neural Networks Learn Non-Smooth Functions Effectively
It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.
Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
A new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones.
Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality
This study derives bounds for an approximation error and a generalization error regarding DNNs with intrinsically low dimensional covariates and proves that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks.
Minimax estimation via wavelet shrinkage
A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.
Neural Networks for Optimal Approximation of Smooth and Analytic Functions
  • H. Mhaskar
  • Mathematics, Computer Science
    Neural Computation
  • 1996
We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation
Nonparametric regression using deep neural networks with ReLU activation function
The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.
A Convergence Theory for Deep Learning via Over-Parameterization
This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting.
Deep Learning without Poor Local Minima
In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first