# Approximation Spaces of Deep Neural Networks

@article{Gribonval2019ApproximationSO,
title={Approximation Spaces of Deep Neural Networks},
author={R{\'e}mi Gribonval and Gitta Kutyniok and Morten Nielsen and Felix Voigtl{\"a}nder},
journal={Constructive Approximation},
year={2019},
volume={55},
pages={259-367}
}
• Published 3 May 2019
• Computer Science, Mathematics
• Constructive Approximation
We study the expressivity of deep neural networks. Measuring a network’s complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We…

### Sobolev-type embeddings for neural network approximation spaces

• Mathematics, Computer Science
ArXiv
• 2021
It is found that, analogous to the case of classical function spaces, it is possible to trade “smoothness” (i.e., approximation rate) for increased integrability in neural network approximation spaces, and an optimal “learning” algorithm for reconstructing functions that are well approximable by ReLU neural networks is simply given by piecewise constant interpolation on a tensor product grid.

### ReLU Network Approximation in Terms of Intrinsic Parameters

• Computer Science, Mathematics
ICML
• 2022
This paper shows that the number of parameters that need to be learned can be signiﬁcantly smaller than people typically expect, and conducts several experiments to verify that training a small part of parameters can also achieve good results for classi ﬁcation problems if other parameters are pre-speciﬂed or pre-trained from a related problem.

### Approximation with Tensor Networks. Part III: Multivariate Approximation

• Computer Science
ArXiv
• 2021
Tensor networks exhibit universal expressivity w.r.t. isotropic, anisotropic and mixed smoothness spaces that is comparable with more general neural networks families such as deep rectified linear unit (ReLU) networks.

### Simultaneous Neural Network Approximations in Sobolev Spaces

• Computer Science
ArXiv
• 2021
This work shows that deep ReLU networks of width O(N logN) and of depth O(L logL) can achieve a non-asymptotic approximation rate of O(n−2(s−1)/dL−2 (s−n)/d) with respect to the W([0, 1]) norm for p ∈ [1,∞).

### Do ReLU Networks Have An Edge When Approximating Compactly-Supported Functions?

• Computer Science, Mathematics
• 2022
It is shown that polynomial regressors and analytic feedforward networks are not universal in this space, and a quantitative uniform version of the universal approximation theorem is derived on the dense subclass of compactly-supported Lipschitz functions.

### How degenerate is the parametrization of neural networks with the ReLU activation function?

• Computer Science, Mathematics
NeurIPS
• 2019
The pathologies which prevent inverse stability in general are presented, and it is shown that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.

### Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

• Computer Science
ArXiv
• 2021
It is proved that for a family of non-smooth activation functions, including ReLU, approximating any single neuron with random features suffers from the curse of dimensionality, providing an explicit separation of expressiveness between neural networks and random feature models.

## References

SHOWING 1-10 OF 71 REFERENCES

### Optimal Approximation with Sparsely Connected Deep Neural Networks

• Computer Science
SIAM J. Math. Data Sci.
• 2019
All function classes that are optimally approximated by a general class of representation systems---so-called affine systems---can be approximating by deep neural networks with minimal connectivity and memory requirements, and it is proved that the lower bounds are achievable for a broad family of function classes.

### Optimal approximation of continuous functions by very deep ReLU networks

It is proved that constant-width fully-connected networks of depth $L\sim W$ provide the fastest possible approximation rate $\|f-\widetilde f\|_\infty = O(\omega_f(O(W^{-2/\nu})))$ that cannot be achieved with less deep networks.

### On the Expressive Power of Deep Learning: A Tensor Analysis

• Computer Science
COLT 2016
• 2015
It is proved that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network.

### Deep vs. shallow networks : An approximation theory perspective

• Computer Science
ArXiv
• 2016
A new definition of relative dimension is proposed to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning.

### Neural Networks for Optimal Approximation of Smooth and Analytic Functions

• Mathematics, Computer Science
Neural Computation
• 1996
We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation

### Nonparametric regression using deep neural networks with ReLU activation function

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

### Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks

• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2019
New upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function are proved, and there is no dependence for piecewise-constant, linear dependence for Piecewise-linear, and no more than quadratic dependence for general piece wise-polynomial.