• Corpus ID: 49566723

Zipf's law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation

@article{Yu2018ZipfsLI,
  title={Zipf's law in 50 languages: its structural pattern, linguistic interpretation, and cognitive motivation},
  author={Shuiyuan Yu and Chunshan Xu and Haitao Liu},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.01855}
}
Zipf's law has been found in many human-related fields, including language, where the frequency of a word is persistently found as a power law function of its frequency rank, known as Zipf's law. [] Key Result This finding indicates that this deviation is a fundamental and universal feature of word frequency distributions in natural languages, not the statistical error of low frequency words. A computer simulation based on the dual-process theory yields Zipf's law with the same structural pattern, suggesting…

Figures from this paper

Zipf’s laws of meaning in Catalan
TLDR
The first study of Zipfian laws relating the frequency of a word with its number of meanings in Catalan is presented, verified via the relationship among their exponents and that of the rank-frequency law.
Shannon entropy as a robust estimator of Zipf's Law in animal vocal communication repertoires
TLDR
The approach for the first time reveals Zipf's law operating in the vocal systems of multiple lineages: songbirds, hyraxes and cetaceans.
Statistical patterns of word frequency suggesting the probabilistic nature of human languages
TLDR
The present study confirmed that those important linguistic issues can be translated into probability and frequency patterns in parole, and suggest that human language may well be probabilistic systems by nature and that statistical may well make inherent properties of human languages.
Language Modeling at Scale
TLDR
This paper shows how Zipf's Law can address bottlenecks in language modeling by grouping parameters for common words and character sequences, because U ≪ N, where U is the number of unique words (types) and N is the size of the training set (tokens).
Parallels of human language in the behavior of bottlenose dolphins
TLDR
Dolphins exhibit striking similarities with humans, and various statistical laws of language that are well-known in quantitative linguistics, i.e. Zipf’s law for word frequencies, the law of meaning distribution, and Menzerath's, law have been found in dolphin vocal or gestural behavior.
Towards Explainable Artificial Text Detection
TLDR
This thesis aims to construct a more general approach to detecting machine-generated text, and to this end identifies statistical aspects of language in terms of how artificial and natural texts differ, thereby proving reliable methods in artificial text detection.
Spotting Urdu Stop Words By Zipf's Statistical Approach
TLDR
An innovative method to extract stop words from large Urdu text using Zipf's law of two factors dependency with least effort approach to spot stop words in Urdu language corpus is presented.
Towards the Prepositional Meaning via Machine Learning: A Case Study of Spanish Grammar
TLDR
This article reviews a particular case of semantic universe, the verbs of movement in Spanish, and verifies the hypothesis about the semantic gradualness of the prepositions (HGSS) using tools of the field of machine learning.
Automatic Speech Recognition Datasets in Cantonese Language: A Survey and a New Dataset
TLDR
This paper creates a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK, and analyzes the existing datasets according to their speech type, data source, total size and availability.
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
TLDR
A powerful and robust Cantonese ASR model is created by applying multi-dataset learning on MDCC and Common Voice zh-HK and the results show the effectiveness of the dataset.
...
...

References

SHOWING 1-10 OF 44 REFERENCES
Zipf’s word frequency law in natural language: A critical review and future directions
TLDR
It is shown that human language has a highly complex, reliable structure in the frequency distribution over and above Zipf’s law, although prior data visualization methods have obscured this fact.
Least effort and the origins of scaling in human language
  • R. F. Cancho, R. Solé
  • Physics
    Proceedings of the National Academy of Sciences of the United States of America
  • 2003
TLDR
This article explains how language evolution can take advantage of a communicative phase transition and suggests that Zipf's law is a hallmark of symbolic reference and not a meaningless feature.
Reply to "Comment on 'A Scaling law beyond Zipf's law and its relation to Heaps' law'"
The dependence on text length of the statistical properties of word occurrences has long been considered a severe limitation on the usefulness of quantitative linguistics. We propose a simple scaling
Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution
TLDR
It is suggested that Zipf's law might in fact be a fundamental law in natural languages because it is demonstrated that ranks derived from random texts and ranksderived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text.
Mandelbrot's Model for Zipf's Law: Can Mandelbrot's Model Explain Zipf's Law for Language?
  • D. Manin
  • Computer Science
    J. Quant. Linguistics
  • 2009
TLDR
A new modification of the Zipf-Mandelbrot model is proposed that is free from this drawback and allows the optimal information/cost ratio to be achieved via language evolution.
Zipf's law unzipped
TLDR
It is argued that the reason that Zipf's law gives a good description of data from seemingly completely unrelated phenomena is that they can all be described as outcomes of a ubiquitous raison d'action.
Zipf's Law and Avoidance of Excessive Synonymy
TLDR
It is suggested that Zipf's law may result from a hierarchical organization of word meanings over the semantic space, which in turn is generated by the evolution of word semantics dominated by expansion of meanings and competition of synonyms.
Random texts exhibit Zipf's-law-like word frequency distribution
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as English. The facts that the frequency of
THE WORD FREQUENCY EFFECT AND LEXICAL ACCESS*
Some recent experiments suggest that only open class words show a frequency effect. Closed class items are accessed independently of their frequency. We carried out five experiments to test the
Power laws for monkeys typing randomly: the case of unequal probabilities
TLDR
It is proved that the rank-frequency distribution follows a power law for assignments of probabilities that have rational log-ratios for any pair of keys, and an argument of Montgomery is presented that settles the remaining cases, also yielding aPower law.
...
...