Word-length Entropies and Correlations of Natural Language Written Texts

  title={Word-length Entropies and Correlations of Natural Language Written Texts},
  author={Maria Kalimeri and Vassilios Constantoudis and Constantinos Papadimitriou and Konstantinos Karamanos and Fotis K. Diakonos and Haris Papageorgiou},
  journal={Journal of Quantitative Linguistics},
  pages={101 - 118}
Abstract We study the frequency distributions and correlations of the word lengths of 10 European languages. Our findings indicate that (a) the word-length distribution of short words quantified by the mean value and the entropy distinguishes the Uralic (Finnish) corpus from the others, (b) the tails at long words, manifested in the high-order moments of the distributions, differentiate the Germanic languages (except for English) from the Romanic languages and Greek and (c) the correlations… 
Evaluating the Irregularity of Natural Languages
The results revealed that real texts have non-trivial structure compared to the ones obtained from randomization procedures, as well as the multiscale entropy analysis.
Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis
This work studies the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project using the natural visibility graph method (NVG), and suggests that word lengths are much more strongly correlated at large distances between words than at short distances between Words.
Recurrence Networks in Natural Languages
The application of a linear discriminant analysis leads to well-separated clusters of family languages based on based on the network-density properties, which show similar average values for density among languages which that belong to the same linguistic family.
Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years
It is speculated that the increase of word length and sentence length in written Chinese may account for this phenomenon, in terms of both the social-cultural aspects and the self-adapting properties of language structures.
Entropy in different text types
The present investigation is an attempt to investigate how the unique linguistic profile of different text types can be reflected in their respective entropy characteristics, and shows a strikingly similar distribution pattern in Chinese and English concerning the relative entropy of word-forms and POS-forms on different sentential positions.
A comparative study of power law scaling in large word-length sequences
A study of the correlation of lengths of words in large literary texts is presented. We use the statistical tools based on Allan factor and fractal dimension for estimating the fractal indices
Can the Probability Distribution of Dependency Distance Measure Language Proficiency of Second Language Learners?
This study corroborates that quantitative linguistic methods can be well utilized in second language acquisition researches by finding that the Zipf-Alekseev distribution well captures the probability distribution of dependency distance of each grade and native speakers.
Entropic Analysis of Garhwali Text
In the present study, a systematic statistical analysis has been performed by the use of words in continuous Garhwali speech corpus. The words of Garhwali in continuous speech corpus are taken from


Entropy Analysis of Word-Length Series of Natural Language Texts: Effects of Text Language and Genre
It is found that the n-gram entropies of natural language texts in word-length representation are sensitive to text language and genre and this sensitivity is attributed to changes in the probability distribution of the lengths of single words.
Word Length in Portuguese Texts
The hypothesis that word length distributions in texts are not chaotic, but abiding by specific laws is already proven for many different languages and therefore it forms the basis for an examination like the present one.
Model generation for word length frequencies in texts with the application of Zipf's order approach
The applicability of the generated mathematical model for word length frequencies was verified and the problem of establishing a relationship between word frequencies of higher Zipf's order with text length was resolved.
Universal Entropy of Word Ordering Across Linguistic Families
A relative entropy measure is computed to quantify the degree of ordering in word sequences from languages belonging to several linguistic families to indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in theructure of language is a statistical linguistic universal.
Quantifying the information in the long-range order of words: Semantic structures and universal linguistic constraints
The study of word length has an almost 150-year long history: it was on August 18, 1851, when Augustus de Morgan, the well-known English mathematician and logician (1806–1871), in a letter to a
Word-Lengt Distribution in English Press Texts
This study examines if the same mathematical model applies to the distribution of word length in daily and weekly English press texts alike and if there is a difference between them at all.