• Corpus ID: 193956

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks

@inproceedings{Safran2017DepthWidthTI,
  title={Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks},
  author={Itay Safran and Ohad Shamir},
  booktitle={ICML},
  year={2017}
}
We provide several new depth-based separation results for feed-forward neural networks, proving that various types of simple and natural functions can be better approximated using deeper networks than shallower ones, even if the shallower networks are much larger. This includes indicators of balls and ellipses; non-linear functions which are radial with respect to the $L_1$ norm; and smooth non-linear functions. We also show that these gaps can be observed experimentally: Increasing the depth… 

Figures from this paper

Optimization-Based Separations for Neural Networks
TLDR
It is proved that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training.
A lattice-based approach to the expressivity of deep ReLU neural networks
TLDR
It is shown that these functions can be seen as the high-dimensional generalization of the triangle wave function used by Telgarsky in 2016, and it is proved that they can be computed by ReLU networks with quadratic depth and linear width in the space dimension.
Depth separation and weight-width trade-offs for sigmoidal neural networks
TLDR
This work provides a simple proof of L2-norm separation between the expressive power of depth-2 and depth-3 sigmoidal neural networks for a large class of input distributions, assuming their weights are polynomially bounded.
Neural Networks with Small Weights and Depth-Separation Barriers
TLDR
This paper provides a negative and constructive answer to whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks, and proves fundamental barriers to proving such results beyond depth $4$ by reduction to open problems and natural-proof barriers in circuit complexity.
Width is Less Important than Depth in ReLU Neural Networks
TLDR
It is shown that depth plays a more significant role than width in the expressive power of neural networks, and an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.
Size and Depth Separation in Approximating Benign Functions with Neural Networks
TLDR
It is shown that beyond depth 4 there is a barrier to showing depth-separation for benign functions, even between networks of constant depth and networks of nonconstant depth, and superpolynomial size lower bounds and barriers to such lower bounds are shown, depending on the assumptions on the function.
Depth separation beyond radial functions
TLDR
A compact approximation domain is focused on, namely the sphere S d − 1 in dimension d, where it is shown that, if the domain radius and the rate of oscillation of the objective function are constant, then approximation by one-hidden-layer networks holds at a poly( d ) rate for any error threshold.
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks
TLDR
It is shown that a necessary condition for a function to be learnable by gradient descent on deep neural networks is to be able to approximate the function, at least in a weak sense, with shallow neural networks.
Layer Folding: Neural Network Depth Reduction using Activation Linearization
TLDR
This work proposes a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one, and applies this method to networks pre-trained on CIFAR-10 and CIFar-100 and finds that they can all be transformed into shallower forms that share a similar depth.
Interplay between depth of neural networks and locality of target functions
TLDR
A remarkable interplay between depth and locality of a target function is reported, and it is found that depth is beneficial for learning local functions but detrimental to learning global functions.
...
...

References

SHOWING 1-10 OF 15 REFERENCES
The Power of Depth for Feedforward Neural Networks
TLDR
It is shown that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, unless its width is exponential in the dimension.
On the complexity of shallow and deep neural network classifiers
TLDR
Upper and lower bounds on network complexity are established, based on the number of hidden units and on their activation functions, showing that deep architectures are able, with the same number of resources, to address more difficult classification problems.
Provable approximation properties for deep neural networks
Why Deep Neural Networks for Function Approximation?
TLDR
It is shown that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neuron needs by a deep network for a given degree of function approximation.
Why Deep Neural Networks?
TLDR
This paper shows that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neuron needs by a deep network for a given degree of function approximation.
Benefits of Depth in Neural Networks
TLDR
This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with reLU and maximization gates, sum-product networks, and boosted decision trees.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Shallow vs. Deep Sum-Product Networks
TLDR
It is proved there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units.
On the Expressive Power of Deep Learning: A Tensor Analysis
TLDR
It is proved that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network.
...
...