On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

  title={On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces},
  author={Satoshi Hayakawa and Taiji Suzuki},
  journal={Neural networks : the official journal of the International Neural Network Society},
  • Satoshi Hayakawa, Taiji Suzuki
  • Published 22 May 2019
  • Computer Science
  • Neural networks : the official journal of the International Neural Network Society

Figures and Tables from this paper

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
It is shown that any linear estimator can be outperformed by deep learning in a sense of the minimax optimal rate especially for a high dimension setting and so-called fast learning rate is obtained.
Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Curves
The generalization error of a DNN estimator is derived and it is proved that its convergence rate is almost optimal, and a certain class of common models are sub-optimal, including linear estimators and other harmonic analysis methods such as wavelets and curvelets.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces
It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.
Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space
The improvement based on adaptivity is remarkable when the region upon which the target function has less smoothness is small and the dimension is large, and the superiority to linear estimators is shown with respect to the convergence rate of the estimation error.
Harmless Overparametrization in Two-layer Neural Networks
This work presents a generalization theory for overparametrized ReLU networks by incorporating an explicit regularizer based on the scaled variation norm, which is equivalent to the ridge from the angle of gradient-based optimization, but is similar to the group lasso in terms of controlling model complexity.
Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods
This work investigates the excess risk of two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs, and shows that the student network provably reaches a near-global optimal solution and outperforms any kernel methods estimator, including neural tangent kernel approach, random feature model, and other kernel methods, in a sense of the minimax optimal rate.
Convergence Rates of Variational Inference in Sparse Deep Learning
This paper shows that variational inference for sparse deep learning retains the same generalization properties than exact Bayesian inference and highlights the connection between estimation and approximation theories via the classical bias-variance trade-off.
Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks
Light is shed on the phenomenon that neural networks seem to break the curse of dimensionality when the data-generating function belongs to the second-order Radon-domain bounded variation space and it is shown that the neural network estimators are minimax optimal up to logarithmic factors.


Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality
A new approximation and estimation error analysis of deep learning with the ReLU activation for functions in a Besov space and its variant with mixed smoothness shows that deep learning has higher adaptivity to the spatial inhomogeneity of the target function than other estimators such as linear ones.
Deep Neural Networks Learn Non-Smooth Functions Effectively
It is shown that the estimators by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate.
Adaptive minimax regression estimation over sparse lq-hulls
The authors' universal aggregation strategies by model mixing achieve the optimal rates simultaneously over the full range of 0 ≤ q ≤ 1 for any Mn and without knowledge of the lq-norm of the best linear coefficients to represent the regression function.
Minimax estimation via wavelet shrinkage
A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.
Nonparametric regression using deep neural networks with ReLU activation function
The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.
Neural Network with Unbounded Activations is Universal Approximator
Deep Learning
Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Error bounds for approximations with deep ReLU networks
Minimax estimation of linear functionals over nonconvex parameter spaces
The minimax theory for estimating linear functionals is extended to the case of a finite union of convex parameter spaces. Upper and lower bounds for the minimax risk can still be described in terms