# Deep Neural Networks Learn Non-Smooth Functions Effectively

@inproceedings{Imaizumi2019DeepNN, title={Deep Neural Networks Learn Non-Smooth Functions Effectively}, author={Masaaki Imaizumi and Kenji Fukumizu}, booktitle={AISTATS}, year={2019} }

We theoretically discuss why deep neural networks (DNNs) performs better than other models in some cases by investigating statistical properties of DNNs for non-smooth functions. While DNNs have empirically shown higher performance than other standard methods, understanding its mechanism is still a challenging problem. From an aspect of the statistical theory, it is known many standard methods attain optimal convergence rates, and thus it has been difficult to find theoretical advantages of…

## 78 Citations

Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality

- Computer ScienceArXiv
- 2019

It is theoretically proved that the generalization performance of deep neural networks (DNNs) is mainly determined by an intrinsic low-dimensional structure of data, and DNNs outperform other non-parametric estimators which are also adaptive to the intrinsic dimension.

Advantage of Deep Neural Networks for Estimating Functions with Singularity on Curves

- Computer Science, MathematicsArXiv
- 2020

The generalization error of a DNN estimator is derived and it is proved that its convergence rate is almost optimal, and a certain class of common models are sub-optimal, including linear estimators and other harmonic analysis methods such as wavelets and curvelets.

Adaptive Approximation and Generalization of Deep Neural Network with Intrinsic Dimensionality

- Computer ScienceJ. Mach. Learn. Res.
- 2020

This study derives bounds for an approximation error and a generalization error regarding DNNs with intrinsically low dimensional covariates and proves that an intrinsic low dimensionality of covariates is the main factor that determines the performance of deep neural networks.

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

- Computer ScienceNeural Networks
- 2020

Estimation of a Function of Low Local Dimensionality by Deep Neural Networks

- Computer ScienceIEEE Transactions on Information Theory
- 2022

It is shown that the least squares regression estimates using DNNs are able to achieve dimensionality reduction in case that the regression function has locally low dimensionality.

Advantage of Deep Neural Networks for Estimating Functions with Singularity on Hypersurfaces

- Computer Science, Mathematics
- 2020

It is argued that deep learning has an advantage over other standard models in terms of the generalization error when f has singularities on a hypersurface in the domain.

Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary

- Computer Science
- 2022

It is shown that DNN classiﬁers can adapt to low-dimensional data structures and circumvent the “curse of dimensionality” in the sense that the minimax rate only depends on the effective dimension, potentially much smaller than the actual data dimension.

Fast convergence rates of deep neural networks for classification

- Computer ScienceNeural Networks
- 2021

Analysis of the rate of convergence of neural network regression estimates which are easy to implement.

- Computer Science, Mathematics
- 2019

This article introduces a new neural network regression estimate where most of the weights are chosen regardless of the data motivated by some recent approximation results for neural networks, and which is therefore easy to implement and which achieves the one-dimensional rate of convergence.

Spectral-Pruning: Compressing deep neural network via spectral analysis

- Computer ScienceArXiv
- 2018

This work develops a new theoretical frame-work for model compression, and proposes a new method called Spectral-Pruning based on the theory, which makes use of both "input" and "output" in each layer and is easy to implement.

## References

SHOWING 1-10 OF 58 REFERENCES

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

- Computer ScienceNeural Networks
- 2018

Nonparametric regression using deep neural networks with ReLU activation function

- Computer ScienceArXiv
- 2017

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

Understanding deep learning requires rethinking generalization

- Computer ScienceICLR
- 2017

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Fast generalization error bound of deep learning from a kernel perspective

- Computer ScienceAISTATS
- 2018

A new theoretical framework to analyze the generalization error of deep learning is developed, and a new fast learning rate is derived for two representative algorithms: empirical risk minimization and Bayesian deep learning.

Memory-optimal neural network approximation

- Computer ScienceOptical Engineering + Applications
- 2017

Stochastic gradient descent is found to actually learn approximations that are sparse in the representation system optimally sparsifying the function class the network is trained on, elucidates a remarkable universality property of deep neural networks and shows that they achieve the optimum approximation properties of all affine systems combined.

On the Number of Linear Regions of Deep Neural Networks

- Computer ScienceNIPS
- 2014

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep…

Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

- Computer ScienceICML
- 2015

This work presents a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP), which works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients.

Adam: A Method for Stochastic Optimization

- Computer ScienceICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

No bad local minima: Data independent training error guarantees for multilayer neural networks

- Computer ScienceArXiv
- 2016

It is proved that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization, and extended to the case of more than onehidden layer.

Deep Residual Learning for Image Recognition

- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.