Approximation Spaces of Deep Neural Networks

  title={Approximation Spaces of Deep Neural Networks},
  author={R{\'e}mi Gribonval and Gitta Kutyniok and Morten Nielsen and Felix Voigtl{\"a}nder},
  journal={Constructive Approximation},
We study the expressivity of deep neural networks. Measuring a network’s complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We… 

Sobolev-type embeddings for neural network approximation spaces

It is found that, analogous to the case of classical function spaces, it is possible to trade “smoothness” (i.e., approximation rate) for increased integrability in neural network approximation spaces, and an optimal “learning” algorithm for reconstructing functions that are well approximable by ReLU neural networks is simply given by piecewise constant interpolation on a tensor product grid.

ReLU Network Approximation in Terms of Intrinsic Parameters

This paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect, and conducts several experiments to verify that training a small part of parameters can also achieve good results for classi fication problems if other parameters are pre-specifled or pre-trained from a related problem.

Simultaneous neural network approximation for smooth functions

Approximation with Tensor Networks. Part III: Multivariate Approximation

Tensor networks exhibit universal expressivity w.r.t. isotropic, anisotropic and mixed smoothness spaces that is comparable with more general neural networks families such as deep rectified linear unit (ReLU) networks.

Simultaneous Neural Network Approximations in Sobolev Spaces

This work shows that deep ReLU networks of width O(N logN) and of depth O(L logL) can achieve a non-asymptotic approximation rate of O(n−2(s−1)/dL−2 (s−n)/d) with respect to the W([0, 1]) norm for p ∈ [1,∞).

Do ReLU Networks Have An Edge When Approximating Compactly-Supported Functions?

It is shown that polynomial regressors and analytic feedforward networks are not universal in this space, and a quantitative uniform version of the universal approximation theorem is derived on the dense subclass of compactly-supported Lipschitz functions.

How degenerate is the parametrization of neural networks with the ReLU activation function?

The pathologies which prevent inverse stability in general are presented, and it is shown that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.

Computation complexity of deep ReLU neural networks in high-dimensional approximation

Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

It is proved that for a family of non-smooth activation functions, including ReLU, approximating any single neuron with random features suffers from the curse of dimensionality, providing an explicit separation of expressiveness between neural networks and random feature models.



Optimal approximation of piecewise smooth functions using deep ReLU neural networks

Optimal Approximation with Sparsely Connected Deep Neural Networks

All function classes that are optimally approximated by a general class of representation systems---so-called affine systems---can be approximating by deep neural networks with minimal connectivity and memory requirements, and it is proved that the lower bounds are achievable for a broad family of function classes.

Provable approximation properties for deep neural networks

Optimal approximation of continuous functions by very deep ReLU networks

It is proved that constant-width fully-connected networks of depth $L\sim W$ provide the fastest possible approximation rate $\|f-\widetilde f\|_\infty = O(\omega_f(O(W^{-2/\nu})))$ that cannot be achieved with less deep networks.

On the Expressive Power of Deep Learning: A Tensor Analysis

It is proved that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network.

Deep vs. shallow networks : An approximation theory perspective

A new definition of relative dimension is proposed to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning.

Neural Networks for Optimal Approximation of Smooth and Analytic Functions

  • H. Mhaskar
  • Mathematics, Computer Science
    Neural Computation
  • 1996
We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation

Nonparametric regression using deep neural networks with ReLU activation function

The theory suggests that for nonparametric regression, scaling the network depth with the sample size is natural and the analysis gives some insights into why multilayer feedforward neural networks perform well in practice.

Error bounds for approximations with deep ReLU networks

Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks

New upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function are proved, and there is no dependence for piecewise-constant, linear dependence for Piecewise-linear, and no more than quadratic dependence for general piece wise-polynomial.