Compression and the origins of Zipf's law for word frequencies

  title={Compression and the origins of Zipf's law for word frequencies},
  author={Ramon Ferrer-i-Cancho},
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures (2) it does not require fine tuning of parameters and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence… 
The origins of Zipf's meaning‐frequency law
It is shown that a single assumption on the joint probability of a word and a meaning suffices to infer Zipf's meaning‐frequency law or relaxed versions, and can be justified as the outcome of a biased random walk in the process of mental exploration.
The Brevity Law as a Scaling Law, and a Possible Origin of Zipf’s Law for Word Frequencies
A new perspective to establish a connection between different statistical linguistic laws is presented, and a possible model-free explanation for the origin of Zipf's law is found, which should arise as a mixture of conditional frequency distributions governed by the crossover length-dependent frequency.
The evolution of optimized language in the light of standard information theory
Extensions of standard information theory predict that in case of optimal coding, the correlation between word frequency and word length cannot be positive and, in general, it is expected to be negative in concordance with Zipf’s law of abbreviation.
From Boltzmann to Zipf through Shannon and Jaynes
A pure statistical-physics framework is used to describe the probabilities of words, and it is found that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.
Scaling Laws for Phonotactic Complexity in Spoken English Language Data
The results support the notion that phonotactic cognition employs information about boundary spanning phonotactics sequences, and Zipf's law exhibits both high goodness-of-fit and a high scaling coefficient if sequences of more than two sounds are considered.
Optimization Models of Natural Communication
Two important components of the family, namely the information theoretic principles and the energy function that combines them linearly, are reviewed from the perspective of psycholinguistics, language learning, information theory and synergetic linguistics.
Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages
We demonstrate that large texts, representing human (English, Russian, Ukrainian) and artificial (C++, Java) languages, display quantitative patterns characterized by the Benford-like and Zipf laws.
Brevity is not a universal in animal communication: evidence for compression depends on the unit of analysis in small ape vocalizations
The results indicate that adherence to linguistic laws in male gibbon solos depends on the unit of analysis, and conclude that principles of compression are applicable outside of human language, but may act differently across levels of organization in biological systems.
The placement of the head that maximizes predictability. An information theoretic approach
This paper adds a competing word order principle: the maximization of predictability of a target element to the minimization of the length of syntactic dependencies from the perspective of information theory.


Compression and the origins of Zipf's law of abbreviation
This work generalizes the information theoretic concept of mean code length as a mean energetic cost function over the probability and the magnitude of the types of the repertoire and shows that the minimization of that cost function and a negative correlation between probability andThe magnitude of types are intimately related.
Random texts exhibit Zipf's-law-like word frequency distribution
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as English. The facts that the frequency of
Large-Scale Analysis of Zipf’s Law in English Texts
This work studies three different versions of Zipf’s law by fitting them to all available English texts in the Project Gutenberg database and finds one of them is able to fit more than 40% of thetexts in the database at the 0.05 significance level.
Towards a rigorous motivation for Ziph's law
It is shown that information-theoretic entropy underpins successful models of both types of language evolution and provides a more principled motivation for Zipf’s Law.
The frequency spectrum of finite samples from the intermittent silence process
This work derives and explains how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.
Compression as a Universal Principle of Animal Behavior
It is shown that minimizing the expected code length implies that the length of a word cannot increase as its frequency increases, which means that the mean code length or duration is significantly small in human language, and also in the behavior of other species in all cases where agreement with the law of brevity has been found.
Gelada vocal sequences follow Menzerath’s linguistic law
In vocal sequences of wild male geladas (Theropithecus gelada), construct size is negatively correlated with constituent size (duration of calls) and formal mathematical support is provided for the idea that Menzerath’s law reflects compression—the principle of minimizing the expected length of a code.
Finitary models of language users
It is proposed to describe talkers and listeners to describe the users of language rather than the language itself, just as the authors' knowledge of arithmetic is not merely the collection of their arithmetic responses, habits, or dispositions.
Statistical language learning
Eugene Charniak points out that as a method of attacking NLP problems, the statistical approach has several advantages and is grounded in real text and therefore promises to produce usable results, and it offers an obvious way to approach learning.
The Now-or-Never bottleneck: A fundamental constraint on language
It is argued that, to deal with this “Now-or-Never” bottleneck, the brain must compress and recode linguistic input as rapidly as possible, which implies that language acquisition is learning to process, rather than inducing, a grammar.