Fitting Ranked English and Spanish Letter Frequency Distribution in US and Mexican Presidential Speeches

@article{Li2011FittingRE,
  title={Fitting Ranked English and Spanish Letter Frequency Distribution in US and Mexican Presidential Speeches},
  author={Wentian Li and Pedro Miramontes},
  journal={Journal of Quantitative Linguistics},
  year={2011},
  volume={18},
  pages={359 - 380}
}
Abstract The limited range in the abscissa of ranked letter frequency distributions causes multiple functions to fit the observed distribution reasonably well. In order to critically compare various functions, we apply the statistical model selections on ten functions, using the texts of US and Mexican presidential speeches of the last few centuries. Despite minor switching of ranking order of certain letters during the temporal evolution for both datasets, the letter usage is generally stable… 
Principle of Least Effort and Sentence Length in Public Speaking
TLDR
The analysis of sentence lengths in the inaugural speeches of US presidents and the annual speeches of UK party leaders is carried out and it is shown that the Weibull is the best distribution for describing sentence length.
Models of Lithuanian Grapheme Frequencies
TLDR
Twenty models of Lithuanian grapheme frequencies were examined and a relative simple model was found to be the best fit, which may also be applicable to other languages.
Large Scale Quantitative Analysis of three Indo-Aryan Languages
TLDR
A thorough quantitative analysis of large scale media text of three Indo-Aryan languages, viz.
Exploring Letter’s Differences between Partial Indonesian Branch Language and English
TLDR
The results show that great differences do exist between three Indonesian-branch languages and English, and the differences between Malay and Indonesian are the smallest.
The 'Letter' Distribution in the Chinese Language
TLDR
The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent, and the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages.
The Diary of Boima Kiakpomgbo from Mando Town (Liberia): A Quantitative Study of a Vai Text
TLDR
A Liberian text from the first quarter of the twentieth century written in Vai, a Mande language of West Africa, using an indigenous syllabic script is analysed, finding the Zipf–Mandelbrot law to be a proper model rather than the simple Zipfian dependence.
Approaches to the classification of complex systems: Words, texts, and more
TLDR
The Chapter discusses entropy as one of the parameters, which can be easily computed from rank–frequency dependences, which being a discriminating parameter in some problems of classification of complex systems can be given a proper interpretation only in a limited class of problems.
Conciseness of Ukrainian, Russian and English: Application to Translation Studies
  • O. Kushnir, O. Dzera, L. Kushnir
  • Computer Science
    2019 XIth International Scientific and Practical Conference on Electronics and Information Technologies (ELIT)
  • 2019
TLDR
This work studies the amount of information contained in symbolic sequences, using a single-character entropy associated with the frequencies of characters that comprise the alphabet of a coding system to examine explicitation hypothesis known from Translation Studies.
...
...

References

SHOWING 1-10 OF 62 REFERENCES
Comparison of Equations Describing the Ranked Frequency Distributions of Graphemes and Phonemes
TLDR
Examination of 32 text corpora from 18 languages shows that both letter and phoneme frequencies are well described by an equation first developed by Yule and by a parameter‐free equation that also describes the distribution of DNA codons.
Zipf's Laws in Italian Texts
TLDR
The results show that Zipf's law is an adequate model and that the corpus has a unique style, even if the texts were compiled by at least two persons.
On the systematic and system-based study of grapheme frequencies: a re-analysis of German letter frequencies
TLDR
A re-analysis of German data reported by Best (2005) is conducted, concentrating on a detailed examination of parameter behavior, and it is shown that all parameters of this distribution behave regularly, if the analysis is based on the system’s inventory size, rather than on the class of items occurring in the given sample.
A general rule for ranged series of codon frequencies in different genomes.
TLDR
A new model is proposed for a similar distribution in which pr = C.(ln(n + 1)-ln r), where n is the quantity of various symbols (codons), and it is shown on the basis of statistical criteria that this model is in good agreement with the ranged series of codon frequencies for the best-studied genoms to date.
Universality of Rank-Ordering Distributions in the Arts and Sciences
TLDR
A universal behavior of the way in which elements of a system are distributed according to their rank with respect to a given property is uncovered, valid for the full range of values, regardless of whether or not a power law has previously been suggested.
Conspiracy in bacterial genomes
Universality and Shannon entropy of codon usage.
The distribution functions of codon usage probabilities, computed over all the available GenBank data for 40 eukaryotic biological species and five chloroplasts, are best fitted by the sum of a
...
...