Fitting Ranked English and Spanish Letter Frequency Distribution in US and Mexican Presidential Speeches

@article{Li2011FittingRE,
  title={Fitting Ranked English and Spanish Letter Frequency Distribution in US and Mexican Presidential Speeches},
  author={Wentian Li and Pedro Miramontes},
  journal={Journal of Quantitative Linguistics},
  year={2011},
  volume={18},
  pages={359 - 380}
}
Abstract The limited range in the abscissa of ranked letter frequency distributions causes multiple functions to fit the observed distribution reasonably well. In order to critically compare various functions, we apply the statistical model selections on ten functions, using the texts of US and Mexican presidential speeches of the last few centuries. Despite minor switching of ranking order of certain letters during the temporal evolution for both datasets, the letter usage is generally stable… 

Principle of Least Effort and Sentence Length in Public Speaking

TLDR
The analysis of sentence lengths in the inaugural speeches of US presidents and the annual speeches of UK party leaders is carried out and it is shown that the Weibull is the best distribution for describing sentence length.

Some Statistical Properties of Phonemes in Standard Chinese

TLDR
The results indicate that vowels and nasals are used most frequently but un-aspirated consonants and sounds articulated at the back of the mouth are pervasive in Standard Chinese.

Models of Lithuanian Grapheme Frequencies

TLDR
Twenty models of Lithuanian grapheme frequencies were examined and a relative simple model was found to be the best fit, which may also be applicable to other languages.

Large Scale Quantitative Analysis of three Indo-Aryan Languages

TLDR
A thorough quantitative analysis of large scale media text of three Indo-Aryan languages, viz.

Exploring Letter’s Differences between Partial Indonesian Branch Language and English

TLDR
The results show that great differences do exist between three Indonesian-branch languages and English, and the differences between Malay and Indonesian are the smallest.

The 'Letter' Distribution in the Chinese Language

TLDR
The results of the statistical analysis showed that, in different historical periods, the intensity of the use of basic particles in Chinese writing varied, but the form of the distribution was consistent, and the distributions of the Chinese constructive parts are certainly consistent with those alphabetic writing languages.

The Diary of Boima Kiakpomgbo from Mando Town (Liberia): A Quantitative Study of a Vai Text

TLDR
A Liberian text from the first quarter of the twentieth century written in Vai, a Mande language of West Africa, using an indigenous syllabic script is analysed, finding the Zipf–Mandelbrot law to be a proper model rather than the simple Zipfian dependence.

Matrices of the frequency and similarity of Arabic letters and allographs

TLDR
The frequency and similarity of Arabic letters and their allographs in the visual and motoric domains, as well as the similarities among the letter sounds, will be useful for researchers interested in the processes underpinning orthographic processing, visual word recognition, reading, and literacy acquisition.

Analysis and Mathematical Modelling of the Pattern of Occurrence of Various Devanāgari Letter Symbols according to the Phonological Inventory of Indic Script in Hindi Language

TLDR
An attempt at analysing the pattern of occurrence of different alphabets of the Hindi alphabet or varṇamālā in the text and corpus of Hindi by using the linear regression techniques for one and two independent variables.

References

SHOWING 1-10 OF 49 REFERENCES

Fitting Ranked Linguistic Data with Two-Parameter Functions

TLDR
This paper compares several two-parameter models, including Beta function, Yule function, Weibull function—all can be framed as a multiple regression in the logarithmic scale—in their fitting performance of several ranked linguistic data, such as letter frequencies, word-spacings, and word frequencies.

Comparison of Equations Describing the Ranked Frequency Distributions of Graphemes and Phonemes

TLDR
Examination of 32 text corpora from 18 languages shows that both letter and phoneme frequencies are well described by an equation first developed by Yule and by a parameter‐free equation that also describes the distribution of DNA codons.

Zipf's Laws in Italian Texts

TLDR
The results show that Zipf's law is an adequate model and that the corpus has a unique style, even if the texts were compiled by at least two persons.

Letter, Grapheme and (Allo-)Phone Frequencies: The Case of Slovak

TLDR
It is shown that the frequency of Slovak letters, graphemes and allophones can be modeled by the negative hypergeometric distribution, indicating a relative under-exploitation and/or overexploitation of LG units as compared to the allophone inventory.

On the systematic and system-based study of grapheme frequencies: a re-analysis of German letter frequencies

TLDR
A re-analysis of German data reported by Best (2005) is conducted, concentrating on a detailed examination of parameter behavior, and it is shown that all parameters of this distribution behave regularly, if the analysis is based on the system’s inventory size, rather than on the class of items occurring in the given sample.

Two frequency-rank law for letters printed in Romanian

TLDR
This paper investigates the way in which the Romanian language obeys a behaviour considered to be correct in case of several natural written languages expressed by two frequency-rank laws.

A general rule for ranged series of codon frequencies in different genomes.

TLDR
A new model is proposed for a similar distribution in which pr = C.(ln(n + 1)-ln r), where n is the quantity of various symbols (codons), and it is shown on the basis of statistical criteria that this model is in good agreement with the ranged series of codon frequencies for the best-studied genoms to date.

Universality of Rank-Ordering Distributions in the Arts and Sciences

TLDR
A universal behavior of the way in which elements of a system are distributed according to their rank with respect to a given property is uncovered, valid for the full range of values, regardless of whether or not a power law has previously been suggested.

Conspiracy in bacterial genomes