Corpus ID: 235436079

# An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

@article{Rajput2021AnEI,
title={An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks},
author={Shashank Rajput and Kartik K. Sreenivasan and Dimitris Papailiopoulos and Amin Karbasi},
journal={ArXiv},
year={2021},
volume={abs/2106.07724}
}
It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that deep threshold networks can memorize n points in d dimensions using Õ(e 2 + √ n) neurons and Õ(e 2 (d+ √ n) + n)weights, where δ is the minimum distance between the points. In this work, we improve the dependence on δ from exponential to almost linear, proving that Õ( 1… Expand
1 Citations

#### Figures and Tables from this paper

On the Optimal Memorization Power of ReLU Neural Networks
• Computer Science, Mathematics
• ArXiv
• 2021
A generalized construction for networks with depth bounded by 1 ≤ L ≤ √ N , for memorizing N samples using Õ(N/L) parameters, and it is proved that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters. Expand

#### References

SHOWING 1-10 OF 27 REFERENCES
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
• Computer Science, Mathematics
• NeurIPS
• 2019
By exploiting depth, it is shown that 3-layer ReLU networks with $\Omega(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points, and it is proved that width $\Theta($N)\$ is necessary and sufficient for memorizing data points, proving tight bounds on memorization capacity. Expand
A Closer Look at Memorization in Deep Networks
The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization. Expand
Estimates of Storage Capacity of Multilayer Perceptron with Threshold Logic Hidden Units
• A. Kowalczyk
• Mathematics, Computer Science
• Neural Networks
• 1997
The storage capacity of multilayer perceptron with n inputs, h(1) threshold logic units in the first hidden layer and 1 output is estimated and it is shown that such a network has memory capacity between nh(1)+1 and 2(nh( 1)+1) input patterns and for the most efficient networks in this class between 1 and 2 input patterns per connection. Expand
Bounds on the learning capacity of some multi-layer networks
• Mathematics
• Biological Cybernetics
• 2004
We obtain bounds for the capacity of some multi-layer networks of linear threshold units. In the case of a network having n inputs, a single layer of h hidden units and an output layer of s units,Expand
On the capabilities of multilayer perceptrons
• E. Baum
• Computer Science, Mathematics
• J. Complex.
• 1988
A construction is presented here for implementing an arbitrary dichotomy with one hidden layer containing [ N d ] units, for any set of N points in general position in d dimensions, which is in fact the smallest such net as dichotomies which cannot be implemented by any net with fewer units. Expand
Understanding deep learning requires rethinking generalization
• Computer Science
• ICLR
• 2017
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity. Expand
Identity Matters in Deep Learning
• Computer Science, Mathematics
• ICLR
• 2017
This work gives a strikingly simple proof that arbitrarily deep linear residual networks have no spurious local optima and shows that residual networks with ReLu activations have universal finite-sample expressivity in the sense that the network can represent any function of its sample provided that the model has more parameters than the sample size. Expand
Learning capability and storage capacity of two-hidden-layer feedforward networks
• Guangbin Huang
• Computer Science, Medicine
• IEEE Trans. Neural Networks
• 2003
This paper rigorously proves in a constructive method that two-hidden-layer feedforward networks (TLFNs) with 2/spl radic/(m+2)N (/spl Lt/N) hidden neurons can learn any N distinct samples with any arbitrarily small error, where m is the required number of output neurons. Expand
A simple method to derive bounds on the size and to train multilayer neural networks
• Mathematics, Computer Science
• IEEE Trans. Neural Networks
• 1991
The training set can be implemented with zero error with two layers and with the number of the hidden-layer neurons equal to #1>/= p-1, and the method presented exactly solves (M), the multilayer neural network training problem, for any arbitrary training set. Expand
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks
• Computer Science, Mathematics
• ICLR
• 2019
A novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks and a matching lower bound for the Rademacher complexity that improves over previous capacity lower bounds for neural networks are presented. Expand