• Corpus ID: 59425874

Deep Neural Networks Learn Non-Smooth Functions Effectively

  title={Deep Neural Networks Learn Non-Smooth Functions Effectively},
  author={Masaaki Imaizumi and Kenji Fukumizu},
We theoretically discuss why deep neural networks (DNNs) performs better than other models in some cases by investigating statistical properties of DNNs for non-smooth functions. While DNNs have empirically shown higher performance than other standard methods, understanding its mechanism is still a challenging problem. From an aspect of the statistical theory, it is known many standard methods attain optimal convergence rates, and thus it has been difficult to find theoretical advantages of… 

Figures and Tables from this paper

Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality
It is theoretically proved that the generalization performance of deep neural networks (DNNs) is mainly determined by an intrinsic low-dimensional structure of data, and DNNs outperform other non-parametric estimators which are also adaptive to the intrinsic dimension.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Curves
The generalization error of a DNN estimator is derived and it is proved that its convergence rate is almost optimal, and a certain class of common models are sub-optimal, including linear estimators and other harmonic analysis methods such as wavelets and curvelets.
Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality
This study derives bounds for an approximation error and a generalization error regarding DNNs with intrinsically low dimensional covariates and proves that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks.
Estimation of a Function of Low Local Dimensionality by Deep Neural Networks
It is shown that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality.
Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces
It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.
Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary
It is shown that DNN classifiers can adapt to low-dimensional data structures and circumvent the “curse of dimensionality” in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension.
Fast convergence rates of deep neural networks for classification
Analysis of the rate of convergence of neural network regression estimates which are easy to implement.
This article introduces a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement and which achieves the one-dimensional rate of convergence.
Spectral-Pruning: Compressing deep neural network via spectral analysis
This work develops a new theoretical frame-work for model compression, and proposes a new method called Spectral-Pruning based on the theory, which makes use of both "input" and "output" in each layer and is easy to implement.


Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Nonparametric regression using deep neural networks with ReLU activation function
The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.
Understanding deep learning requires rethinking generalization
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
Fast generalization error bound of deep learning from a kernel perspective
A new theoretical framework to analyze the generalization error of deep learning is developed, and a new fast learning rate is derived for two representative algorithms: empirical risk minimization and Bayesian deep learning.
Memory-optimal neural network approximation
Stochastic gradient descent is found to actually learn approximations that are sparse in the representation system optimally sparsifying the function class the network is trained on, elucidates a remarkable universality property of deep neural networks and shows that they achieve the optimum approximation properties of all affine systems combined.
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep
Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks
This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
No bad local minima: Data independent training error guarantees for multilayer neural networks
It is proved that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization, and extended to the case of more than onehidden layer.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.