• Corpus ID: 237204520

Rethinking Neural Networks With Benford's Law

@inproceedings{Sahu2021RethinkingNN,
  title={Rethinking Neural Networks With Benford's Law},
  author={Surya Kant Sahu and Abhinav Java and Arshad Shaikh and Yannic Kilcher},
  year={2021}
}
Benford’s Law (BL) or the Significant Digit Law defines the probability distribution of the first digit of numerical values in a data sample. This Law is observed in many datasets. It can be seen as a measure of naturalness of a given distribution and finds its application in areas like anomaly and fraud detection. In this work, we address the following question: Is the distribution of the Neural Network parameters related to the network’s generalization capability? To that end, we first define… 

References

SHOWING 1-10 OF 31 REFERENCES

Benford's law in the natural sciences

More than 100 years ago it was predicted that the distribution of first digits of real world observations would not be uniform, but instead follow a trend where measurements with lower first digit

Benford's Law for Natural and Synthetic Images

TLDR
It is shown how light intensities in natural images, under certain constraints, obey Benford's Law closely and how light intensity in synthetic images follow this law whenever they are generated using physically realistic methods, and fail otherwise.

The significant digit law in statistical physics

An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

TLDR
It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes.

Understanding deep learning requires rethinking generalization

TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.

Adam: A Method for Stochastic Optimization

TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Images and Benford's Law

  • J. Jolion
  • Computer Science
    Journal of Mathematical Imaging and Vision
  • 2004
TLDR
It is shown in this paper that the magnitude of the gradient of an image obeys Benford's law and this yields to the field of entropy based coding which takes advantage of the a priori information about the probability of any symbol in the signal.

Long Short-Term Memory

TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Model selection and multimodel inference : a practical information-theoretic approach

The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference). A philosophy is

Base-Invariance Implies Benford's Law

A derivation of Benford's Law or the First-Digit Phenomenon is given assuming only base-invariance of the underlying law. The only baseinvariant distributions are shown to be convex combinations of