Corpus ID: 219708742

# Minimum Width for Universal Approximation

@article{Park2021MinimumWF,
title={Minimum Width for Universal Approximation},
author={Sejun Park and Chulhee Yun and Jaeho Lee and Jinwoo Shin},
journal={ArXiv},
year={2021},
volume={abs/2006.08859}
}
The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. However, the critical width enabling the universal approximation has not been exactly characterized in terms of the input dimension $d_x$ and the output dimension $d_y$. In this work, we provide the first definitive result in this direction for networks using the ReLU activation functions: The minimum width required for the universal… Expand

#### Figures, Tables, and Topics from this paper

Expressiveness of Neural Networks Having Width Equal or Below the Input Dimension
• Computer Science, Mathematics
• ArXiv
• 2020
It is concluded from a maximum principle that for all continuous and monotonic activation functions, universal approximation of arbitrary continuous functions is impossible on sets that coincide with the boundary of an open set plus an inner point of that set. Expand
Characterizing the Universal Approximation Property
This paper constructs a modification of the feed-forward architecture, which can approximate any continuous function, with a controlled growth rate, uniformly on the entire domain space, and it is shown that theFeed- forward architecture typically cannot. Expand
Quantitative Rates and Fundamental Obstructions to Non-Euclidean Universal Approximation with Deep Narrow Feed-Forward Networks
• Computer Science
• ArXiv
• 2021
The number of narrow layers required for these ”deep geometric feed-forward neural networks” (DGNs) to approximate any continuous function in C(X,Y), uniformly on compacts is quantified and a quantitative version of the universal approximation theorem is obtained. Expand
Universal approximation power of deep residual neural networks via nonlinear control theory
• Computer Science
• ICLR
• 2021
The universal approximation capabilities of deep residual neural networks through geometric nonlinear control are explained and monotonicity is identified as the bridge between controllability of finite ensembles and uniform approximability on compact sets. Expand
Overcoming The Limitations of Neural Networks in Composite-Pattern Learning with Architopes
• Computer Science
• ArXiv
• 2020
It is demonstrated that the feed-forward architecture, for most commonly used activation functions, is incapable of approximating functions comprised of multiple sub-patterns while simultaneously respecting their composite-pattern structure, so a simple architecture modification is implemented that reallocates the neurons of any singleFeed-forward network across several smaller sub-networks, each specialized on a distinct part of the input-space. Expand
How Attentive are Graph Attention Networks?
• Computer Science
• ArXiv
• 2021
It is shown that GATs can only compute a restricted kind of attention where the ranking of attended nodes is unconditioned on the query node, and a simple fix is introduced by modifying the order of operations and proposed GATv2: a dynamic graph attention variant that is strictly more expressive than GAT. Expand
On decision regions of narrow deep neural networks
• Computer Science, Mathematics
• Neural Networks
• 2021
We show that for neural network functions that have width less or equal to the input dimension all connected components of decision regions are unbounded. The result holds for continuous and strictlyExpand
Uncertainty Principles of Encoding GANs
• Ruili Feng, Zhouchen Lin, Deli Zhao, Jingren Zhou
• Computer Science
• ICML
• 2021
It is proved that the ‘perfect’ encoder and generator cannot be continuous at the same time, which implies that current framework of encoding GANs is illposed and needs rethinking, and neural networks cannot approximate the underlying encoders and generators precisely at thesame time. Expand
Non-Euclidean Universal Approximation
• Computer Science, Mathematics
• NeurIPS
• 2020
General conditions describing feature and readout maps that preserve an architecture's ability to approximate any continuous functions uniformly on compacts are presented and if an architecture is capable of universal approximation, then modifying its final layer to produce binary values creates a new architecture capable of deterministically approximating any classifier. Expand

#### References

SHOWING 1-10 OF 43 REFERENCES
Universal Approximation with Deep Narrow Networks
• Computer Science, Mathematics
• COLT 2019
• 2019
The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth, and nowhere differentiable activation functions, density in noncompact domains with respect to the $L^p$-norm, and how the width may be reduced to just $n + m + 1$ for `most' activation functions. Expand
Approximating Continuous Functions by ReLU Nets of Minimal Width
• Computer Science, Mathematics
• ArXiv
• 2017
This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d\geq 1,$ what is the minimalExpand
The Expressive Power of Neural Networks: A View from the Width
• Computer Science, Mathematics
• NIPS
• 2017
It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy. Expand
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
• Computer Science, Mathematics
• NeurIPS
• 2019
By exploiting depth, it is shown that 3-layer ReLU networks with $\Omega(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points, and it is proved that width $\Theta($N)$is necessary and sufficient for memorizing data points, proving tight bounds on memorization capacity. Expand Benefits of Depth in Neural Networks This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with reLU and maximization gates, sum-product networks, and boosted decision trees. Expand Why Does Deep and Cheap Learning Work So Well? • Mathematics, Physics • ArXiv • 2016 It is argued that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine learning, a deep neural network can be more efficient than a shallow one. Expand The Power of Depth for Feedforward Neural Networks • Computer Science, Mathematics • COLT • 2016 It is shown that there is a simple (approximately radial) function on$\reals^d\$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, unless its width is exponential in the dimension. Expand
On the capabilities of multilayer perceptrons
• E. Baum
• Computer Science, Mathematics
• J. Complex.
• 1988
A construction is presented here for implementing an arbitrary dichotomy with one hidden layer containing [ N d ] units, for any set of N points in general position in d dimensions, which is in fact the smallest such net as dichotomies which cannot be implemented by any net with fewer units. Expand
The Jordan-Scho¨nflies theorem and the classification of surfaces
INTRODUCTION. The Jordan curve theorem says that a simple closed curve in the Euclidean plane partitions the plane into precisely two parts: the interior and the exterior of the curve. Although thisExpand
Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions
• Mathematics, Computer Science
• IEEE Trans. Neural Networks
• 1998
This paper rigorously proves that standard single-hidden layer feedforward networks with at most N hidden neurons and with any bounded nonlinear activation function which has a limit at one infinity can learn N distinct samples with zero error. Expand