A Brief History of Generative Models for Power Law and Lognormal Distributions

@article{Mitzenmacher2003ABH,
  title={A Brief History of Generative Models for Power Law and Lognormal Distributions},
  author={Michael Mitzenmacher},
  journal={Internet Mathematics},
  year={2003},
  volume={1},
  pages={226 - 251}
}
Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a lognormal distribution. In trying to learn enough about these distributions to settle the question, I found a rich and long history, spanning many fields. Indeed, several recently proposed models from the computer science community have antecedents in work from decades ago. Here, I briefly survey some of this history, focusing on underlying generative models… 
On the Power Laws of Language: Word Frequency Distributions
TLDR
A simple generative model is proposed to capture the word frequency distribution of languages and is shown to match the observations both analytically and empirically.
Swarm simulations of the power law distribution models
TLDR
This work bases its simulations on existing models where incremental growth and preferential attachment are the key ingredients for the emergence of power laws as well as expand those to include new variables and proposes a new model without the incremental growth requirement.
How rare are power-law networks really?
TLDR
This paper modifications the well-known Kolmogorov–Smirnov test to achieve even sensitivity along the tail, considering the dependence between the empirical degrees under the null distribution, while guaranteeing sufficient power of the test.
Learning and Interpreting Complex Distributions in Empirical Data
TLDR
This paper showcases a four-parameter dynamic model together with inference and simulation algorithms, which is able to fit and generate a family of distributions, ranging from Gaussian, Exponential, Power Law, Stretched Exponential (Weibull), to their complex variants with multi-scale complexities.
Competition and fragmentation: a simple model generating lognormal-like distributions
The current distribution of language size in terms of speaker population is generally described using a lognormal distribution. Analyzing the original real data we show how the double-Pareto
Short-ranged memory model with preferential growth.
TLDR
A variant of the Yule-Simon model for preferential growth by incorporating a finite kernel to model the effects of bounded memory is introduced and the properties of the model are characterized combining analytical arguments with extensive numerical simulations.
Power laws, Pareto distributions and Zipf's law
Power laws, Pareto distributions and Zipf's law
TLDR
Some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them are reviewed.
Probability Distributions in Complex Systems
  • D. Sornette
  • Physics
    Encyclopedia of Complexity and Systems Science
  • 2009
TLDR
This essay enlarges the description of distributions by proposing that ``kings'', i.e., events even beyond the extrapolation of the power law tail, may reveal an information which is complementary and perhaps sometimes even more important than the powerlaw distribution.
Power-Law Distributions in Empirical Data
TLDR
This work proposes a principled statistical framework for discerning and quantifying power-law behavior in empirical data by combining maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 174 REFERENCES
The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions
Abstract A family of probability densities, which has proved useful in modelling the size distributions of various phenomens, including incomes and earnings, human settlement sizes, oil-field volumes
On 1/f noise and other distributions with long tails.
  • E. Montroll, M. Shlesinger
  • Mathematics
    Proceedings of the National Academy of Sciences of the United States of America
  • 1982
TLDR
A simple amplification model is introduced to characterize the transition from a log-normal distribution to an inverse-power Pareto tail.
Some Further Notes on a Class of Skew Distribution Functions
Population fluctuations, power laws and mixtures of lognormal distributions
A number of investigators have invoked a cascading local interaction model to account for power-law-distributed fluctuations in ecological variables. Invoking such a model requires that species be
Maximum entropy formalism, fractals, scaling phenomena, and 1/f noise: A tale of tails
In this report on examples of distribution functions with long tails we (a) show that the derivation of distributions with inverse power tails from a maximum entropy formalism would be a consequence
From gene families and genera to incomes and internet file sizes: why power laws are so common in nature.
  • W. Reed, B. Hughes
  • Economics
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2002
TLDR
If stochastic processes with exponential growth in expectation are killed (or observed) randomly, the distribution of the killed or observed state exhibits power-law behavior in one or both tails.
On the tails of web file size distributions
TLDR
It is argued that the data ususally available for classifying a distribution is insufficient to classify the tail and it is sufficient to focus on mechanisms leading to power law like “waists” of the distributions.
Informetric distributions, part I: Unified overview
This article is the first of a two‐part series on the informetric distributions, a family of regularities found to describe a wide range of phenomena both within and outside of the information
ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS
It is the purpose of this paper to analyse a class of distribution functions that appears in a wide range of empirical data-particularly data describing sociological, biological and economic
Informetric distributions, part I: Unified overview
TLDR
The basic forms these regularities take are introduced, a model is proposed that makes plausible the possibility that, in spite of marked differences in their appearance, these distributions are variants of a single distribution.
...
1
2
3
4
5
...