# Universal Approximation with Deep Narrow Networks

@article{Kidger2019UniversalAW, title={Universal Approximation with Deep Narrow Networks}, author={Patrick Kidger and Terry Lyons}, journal={ArXiv}, year={2019}, volume={abs/1905.08539} }

The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $\rho$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and…

## 107 Citations

Minimum Width for Universal Approximation

- Computer ScienceICLR
- 2021

This work provides the first definitive result in this direction for networks using the ReLU activation functions: the minimum width required for the universal approximation of the L^p functions is exactly $\max\{d_x+1,d_y\}$.

Arbitrary-Depth Universal Approximation Theorems for Operator Neural Networks

- Mathematics, Computer ScienceArXiv
- 2021

It is proved that for non-polynomial activation functions that are continuously differentiable at a point with a nonzero derivative, one can construct an operator NN of width five, whose inputs are real numbers with finite decimal representations, that is arbitrarily close to any given continuous nonlinear operator.

Quantitative Rates and Fundamental Obstructions to Non-Euclidean Universal Approximation with Deep Narrow Feed-Forward Networks

- Mathematics, Computer ScienceArXiv
- 2021

The number of narrow layers required for these ”deep geometric feed-forward neural networks” (DGNs) to approximate any continuous function in C(X,Y), uniformly on compacts is quantified and a quantitative version of the universal approximation theorem is obtained.

Abstract Universal Approximation for Neural Networks

- Computer ScienceArXiv
- 2020

The AUA theorem tells us that there exists a neural network that approximates f and for which the authors can automatically construct proofs of robustness using the interval abstract domain, and sheds light on the existence of provably correct neural networks.

Width is Less Important than Depth in ReLU Neural Networks

- Computer ScienceArXiv
- 2022

It is shown that depth plays a more significant role than width in the expressive power of neural networks, and an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.

Universal Approximation Under Constraints is Possible with Transformers

- Computer Science, MathematicsArXiv
- 2021

A quantitative constrained universal approximation theorem which guarantees that for any convex or non-convex compact set K and any continuous function f : R → K, there is a probabilistic transformer F̂ whose randomized outputs all lie in K and whose expected output uniformly approximates f .

Universal approximation power of deep residual neural networks via nonlinear control theory

- Computer Science, MathematicsICLR
- 2021

The universal approximation capabilities of deep residual neural networks through geometric nonlinear control are explained and monotonicity is identified as the bridge between controllability of finite ensembles and uniform approximability on compact sets.

Piecewise-Linear Activations or Analytic Activation Functions: Which Produce More Expressive Neural Networks?

- Computer ScienceArXiv
- 2022

The main result demonstrates that deep networks with piecewise linear activation (e.g. ReLU or PReLU) are fundamentally more expressive than deep feedforward networks with analytic activation functions and is further explained by quantitatively demonstrating the “separation phenomenon” between the networks in NN ReLU + Pool.

Universal Approximation Power of Deep Neural Networks via Nonlinear Control Theory

- Computer Science, MathematicsArXiv
- 2020

This paper provides a general sufficient condition for a residual network to have the power of universal approximation by asking the activation function, or one of its derivatives, to satisfy a quadratic differential equation.

Characterizing the Universal Approximation Property

- Computer Science
- 2019

This paper constructs a modification of the feed-forward architecture, which can approximate any continuous function, with a controlled growth rate, uniformly on the entire domain space, and it is shown that theFeed- forward architecture typically cannot.

## References

SHOWING 1-10 OF 35 REFERENCES

Approximating Continuous Functions by ReLU Nets of Minimal Width

- Computer ScienceArXiv
- 2017

This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d\geq 1,$ what is the minimal…

The Expressive Power of Neural Networks: A View from the Width

- Computer ScienceNIPS
- 2017

It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy.

Understanding Deep Neural Networks with Rectified Linear Units

- Computer Science, MathematicsElectron. Colloquium Comput. Complex.
- 2017

The gap theorems hold for smoothly parametrized families of "hard" functions, contrary to countable, discrete families known in the literature, and a new lowerbound on the number of affine pieces is shown, larger than previous constructions in certain regimes of the network architecture.

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

- Computer ScienceNeural Networks
- 2018

Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms

- Computer Science, MathematicsAnalysis and Applications
- 2019

This work constructs, based on a calculus of ReLU networks, artificial neural networks with ReLU activation functions that achieve certain approximation rates and establishes lower bounds for the approximation by ReLU neural networks for classes of Sobolev-regular functions.

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

- Computer ScienceNeurIPS
- 2019

By exploiting depth, it is shown that 3-layer ReLU networks with $\Omega(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points, and it is proved that width $\Theta($N)$ is necessary and sufficient for memorizing data points, proving tight bounds on memorization capacity.

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds

- Computer ScienceNeurIPS
- 2019

This paper proves that neural networks can efficiently approximate functions supported on low dimensional manifolds, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function.

Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

- Computer ScienceNeural Computation
- 2014

This analysis covers discrete restricted Boltzmann machines and naive Bayes models as special cases and shows that a q-ary deep belief network with layers of width for some can approximate any probability distribution on without exceeding a Kullback-Leibler divergence.

Nonlinear Approximation and (Deep) ReLU Networks

- Computer ScienceArXiv
- 2019

The main results of this article prove that neural networks possess even greater approximation power than these traditional methods of nonlinear approximation, and exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task.

How degenerate is the parametrization of neural networks with the ReLU activation function?

- Computer Science, MathematicsNeurIPS
- 2019

The pathologies which prevent inverse stability in general are presented, and it is shown that by optimizing over such restricted sets, it is still possible to learn any function which can be learned by optimization over unrestricted sets.