Studies on Zipf's law

  author={Laurence D. Stephens and Henri Guiter and M. V. Arapov},
Semantic Stability and Implicit Consensus in Social Tagging Streams
Tagging streams that are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams that were generated via imitation dynamics or natural language phenomena alone.
Statistical Comparability: Methodological Caveats
  • R. Köhler
  • Linguistics
    Building and Using Comparable Corpora
  • 2013
A number of other frequently used terms and concepts, such as representativeness, homogeneity, and balanced corpora, play a central role in corpus-linguistic argumentations and will be analysed in the paper, too, as they concern compilation and use of comparable corpora.
Zipf’s law—another view
This paper proposes a new approach to the problem of Zipf’s Law, based on the assumption that every data set which displays a Zipf-like structure is composed of several system components.
Scaling laws in cognitive sciences
Extending Zipf’s law to n-grams for large corpora
When single words are combined together with word n-grams in one list and put in rank order, the frequency of tokens in the combined list extends Zipf’s law with a slope close to −1 on a log-log plot in all five languages.
Zipf and Type-Token rules for the English, Spanish, Irish and Latin languages
The Zipf curves of log of frequency against log of rank for a large English corpus of 500 million word tokens, 689,000 word types and for a large Spanish corpus of 16 million word tokens, 139,000
Zipf's data on the frequency of Chinese words revisited
A core (nucleus) of most frequently used Chinese words is determined by reanalyzing George Zipf's data on the frequency of Chinese words and using an additional term to Leimkuhler's function leads to a statistically acceptable fit.
Zipf and his heirs
A Quantitative Approach to Lexical Structure of Proverbs
As has been repeatedly pointed out elsewhere, the linguistic structure of proverbs has hardly ever been seriously studied with regard to underlying regularities, and special attention has been paid to the development of adequate methods which go beyond traditional approaches.
Authorship Attribution and Pastiche
Gilbert Adair's pastiche of Lewis Carroll, Alice Through the Needle's Eye, is compared with the original `Alice' books and a principal component analysis based on word frequencies finds that the main differences are not due to authorship.