• Corpus ID: 59425874

Deep Neural Networks Learn Non-Smooth Functions Effectively

@inproceedings{Imaizumi2019DeepNN,
  title={Deep Neural Networks Learn Non-Smooth Functions Effectively},
  author={Masaaki Imaizumi and Kenji Fukumizu},
  booktitle={AISTATS},
  year={2019}
}
We theoretically discuss why deep neural networks (DNNs) performs better than other models in some cases by investigating statistical properties of DNNs for non-smooth functions. While DNNs have empirically shown higher performance than other standard methods, understanding its mechanism is still a challenging problem. From an aspect of the statistical theory, it is known many standard methods attain optimal convergence rates, and thus it has been difficult to find theoretical advantages of… 

Figures and Tables from this paper

Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality
TLDR
It is theoretically proved that the generalization performance of deep neural networks (DNNs) is mainly determined by an intrinsic low-dimensional structure of data, and DNNs outperform other non-parametric estimators which are also adaptive to the intrinsic dimension.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Curves
TLDR
The generalization error of a DNN estimator is derived and it is proved that its convergence rate is almost optimal, and a certain class of common models are sub-optimal, including linear estimators and other harmonic analysis methods such as wavelets and curvelets.
Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality
TLDR
This study derives bounds for an approximation error and a generalization error regarding DNNs with intrinsically low dimensional covariates and proves that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks.
Estimation of a Function of Low Local Dimensionality by Deep Neural Networks
TLDR
It is shown that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces
TLDR
It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.
Fast convergence rates of deep neural networks for classification
Analysis of the rate of convergence of neural network regression estimates which are easy to implement.
TLDR
This article introduces a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement and which achieves the one-dimensional rate of convergence.
Spectral-Pruning: Compressing deep neural network via spectral analysis
TLDR
This work develops a new theoretical frame-work for model compression, and proposes a new method called Spectral-Pruning based on the theory, which makes use of both "input" and "output" in each layer and is easy to implement.
On the rate of convergence of fully connected very deep neural network regression estimates
TLDR
This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.
...
...

References

SHOWING 1-10 OF 58 REFERENCES
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Nonparametric regression using deep neural networks with ReLU activation function
TLDR
The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
Fast generalization error bound of deep learning from a kernel perspective
TLDR
A new theoretical framework to analyze the generalization error of deep learning is developed, and a new fast learning rate is derived for two representative algorithms: empirical risk minimization and Bayesian deep learning.
Memory-optimal neural network approximation
TLDR
Stochastic gradient descent is found to actually learn approximations that are sparse in the representation system optimally sparsifying the function class the network is trained on, elucidates a remarkable universality property of deep neural networks and shows that they achieve the optimum approximation properties of all affine systems combined.
Optimal Approximation with Sparsely Connected Deep Neural Networks
TLDR
All function classes that are optimally approximated by a general class of representation systems---so-called affine systems---can be approximating by deep neural networks with minimal connectivity and memory requirements, and it is proved that the lower bounds are achievable for a broad family of function classes.
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep
Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks
TLDR
This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
No bad local minima: Data independent training error guarantees for multilayer neural networks
TLDR
It is proved that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization, and extended to the case of more than onehidden layer.
...
...