Random texts exhibit Zipf's-law-like word frequency distribution

@article{Li1992RandomTE,
  title={Random texts exhibit Zipf's-law-like word frequency distribution},
  author={Wentian Li},
  journal={IEEE Trans. Inf. Theory},
  year={1992},
  volume={38},
  pages={1842-1845}
}
  • Wentian Li
  • Published 1 November 1992
  • Mathematics
  • IEEE Trans. Inf. Theory
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to its rank, which stretches an exponential function to a power law function. > 

Zipf's Law and Random Texts

TLDR
It is shown that real texts fill the lexical spectrum much more efficiently and regardless of the word length, suggesting that the meaningfulness of Zipf's law is high.

Minimal models for text production and Zipf's law

  • J. FontanariL. Perlovsky
  • Physics
    International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2005.
  • 2005
TLDR
It is shown that when interaction is taken into account by allowing the words to compete amongst themselves for space in the memory of the users, the resulting word frequency distribution is best described by an exponential, rather than by a power-law.

Zipf's law of abbreviation as a language universal

TLDR
It is argued that this universal trend of words that are used more frequently tend to be shorter is likely to derive from fundamental principles of information processing and transfer.

Zipf's law against the text size: a half-rational model

TLDR
A simple model of dependence of Zipf-Mandelbrot law on the text size is presented, which is featured by variable power-law tail and constant ratio of the most frequent words.

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

TLDR
It is suggested that Zipf's law might in fact be a fundamental law in natural languages because it is demonstrated that ranks derived from random texts and ranksderived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text.

Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

TLDR
It is concluded that the exponents of Zipf’s law are very similar, despite the remarkable transformation that going from words to lemmas represents, considerably affecting all ranges of frequencies.

Compression and the origins of Zipf's law for word frequencies

TLDR
A new derivation of Zipf's law for word frequencies based on optimal coding that sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws.

Algorithmic information, complexity and Zipf's law

TLDR
It is found that natural languages have maximum complexity and it is argued that random text models are unsuitable for natural languages.

A Simple LNRE Model for Random Character Sequences

TLDR
The model, which has convenient analytical and numerical properties, is shown to be adequate for the description of language data extracted by automatic means from large text corpora and can be used to study the problems faced by the statistical analysis of such data in the field of natural-language processing.

Zipf's Law and Avoidance of Excessive Synonymy

TLDR
It is suggested that Zipf's law may result from a hierarchical organization of word meanings over the semantic space, which in turn is generated by the evolution of word semantics dominated by expansion of meanings and competition of synonyms.
...

References

SHOWING 1-10 OF 17 REFERENCES

Mutual Information Functions of Natural Language Texts

TLDR
Although the analysis presented in this paper depends on the concepts in information theory, the emphasis is on the correlation be-tween two letters separated by, which is the inverse Fourier transformation of the power spectrum.

Fractal Geometry of Nature

TLDR
This book is a blend of erudition, popularization, and exposition, and the illustrations include many superb examples of computer graphics that are works of art in their own right.

Intermittency, self-similarity and 1/f spectrum in dissipative dynamical systems

Nous etudions un systeme dynamique dissipatif discret qui presente une transition vers la turbulence par intermittence. Au seuil d'instabilite, ce modele possede une structure d'homothetie interne

The Peculiar Distribution of First Digits

The Fractal Geometry of Nature (Freeman

  • 1982); Fractals: Form, Chance and Dimension (Freeman, 1977); Les objects fractal: forme, hasard et dimension
  • 1975

Selective Studies and the Principle of Relative Frequency in Language (Cambridge

  • Mass, 1932); Human Behavior and the Principle of Least-Effort (Cambridge, Mass, 1949; Addison- Wesley, 1965); The Psycho-biology of Language: An Introduction to Dynamic Philology
  • 1965

The Fractal Geometry of Nature (Freeman, 1982); Fractals: Form, Chance and Dimension (Freeman, 1977); Les objects fractal: forme, hasard et dimension (Flammarion

  • 1975

Raimi , " The peculiar distribution of first digits Manneville , " Intermittency , self - similarity and 1 / f spectrum in dissipative dynamical systems

  • Le Journal De Physique
  • 1953

Selective Studies and the Principle of Relative Frequency in Language

  • Human Behavior and the Principle of Least-Effort
  • 1932