# Rethinking Neural Networks With Benford's Law

@inproceedings{Sahu2021RethinkingNN, title={Rethinking Neural Networks With Benford's Law}, author={Surya Kant Sahu and Abhinav Java and Arshad Shaikh and Yannic Kilcher}, year={2021} }

Benford’s Law (BL) or the Significant Digit Law defines the probability distribution of the first digit of numerical values in a data sample. This Law is observed in many datasets. It can be seen as a measure of naturalness of a given distribution and finds its application in areas like anomaly and fraud detection. In this work, we address the following question: Is the distribution of the Neural Network parameters related to the network’s generalization capability? To that end, we first define…

## Figures and Tables from this paper

## References

SHOWING 1-10 OF 31 REFERENCES

### Benford's law in the natural sciences

- Geology
- 2010

More than 100 years ago it was predicted that the distribution of first digits of real world observations would not be uniform, but instead follow a trend where measurements with lower first digit…

### Benford's Law for Natural and Synthetic Images

- MathematicsCAe
- 2005

It is shown how light intensities in natural images, under certain constraints, obey Benford's Law closely and how light intensity in synthetic images follow this law whenever they are generated using physically realistic methods, and fail otherwise.

### An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

- Computer ScienceICLR
- 2014

It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes.

### Understanding deep learning requires rethinking generalization

- Computer ScienceICLR
- 2017

These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

### Adam: A Method for Stochastic Optimization

- Computer ScienceICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

### Images and Benford's Law

- Computer ScienceJournal of Mathematical Imaging and Vision
- 2004

It is shown in this paper that the magnitude of the gradient of an image obeys Benford's law and this yields to the field of entropy based coding which takes advantage of the a priori information about the probability of any symbol in the signal.

### Long Short-Term Memory

- Computer ScienceNeural Computation
- 1997

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

### Model selection and multimodel inference : a practical information-theoretic approach

- Computer Science
- 2003

The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy is…

### Base-Invariance Implies Benford's Law

- Mathematics
- 1995

A derivation of Benford's Law or the First-Digit Phenomenon is given assuming only base-invariance of the underlying law. The only baseinvariant distributions are shown to be convex combinations of…