Traces of the Mouth: Andrei Andreyevich Markov’s Mathematization of Writing


This article discusses in detail two works by the Russian mathematician Andrei Andreyevich Markov Sr (1856–1922). They represent an early and momentous attempt to understand the phenomenon of language in mathematical terms. Outside of the strictly mathematical field, Markov’s achievements are only very rarely discussed. In the works, he counted the frequency of vowels and consonants in Pushkin’s Eugene Onegin and another text, and analysed the results with the mathematical tools of the probability theory of his time. In what follows, I give a brief account of the role that letters played in probability theory up to this point. The understanding of language in these concepts was so weak that it did not even allow very simple problems to be solved. I then describe Markov’s analysis in detail. Since 1906, Markov’s work had extended certain concepts of probability theory, which were considered to apply only to independent trials, to the field of dependent variables on a purely theoretical basis. In Pushkin’s text Markov found for the first time material to verify his assumptions empirically. Since this was his primary interest, he made no further comment on the meaning of his findings. The first astonishing result was that the distribution of vowels and consonants followed a ‘normal’ distribution. Although Markov did not say as much, this means that at the source of language lies a random process. I attempt to find an explanation for this in the lectures of the Swiss linguist Ferdinand de Saussure, which he gave at approximately the same time and which offer a helpful theory on the collective genesis of language. Even though it is probable that Markov and Saussure were unacquainted with each other’s work, they shared a strong interest in formalization and an approach that is differential rather than substantial. By applying Markov’s analysis to randomly selected words, I demonstrate first that it is not the individual style of an author that produced the observed randomness. It is the fact, as stated by Saussure, that language is formed in an unconscious, collective process. Certain individuals begin to speak differently and their changes to the language may or may not be accepted by others. Markov’s second result was that the dispersion of this random distribution is much smaller than would be expected. Again, Markov applied the theoretical formulae of his earlier work only to verify their validity, and did not enquire as to the reason for this phenomenon. Saussure’s theory provides an explanation: the few individuals who start to speak differently are subject to the physical constraints imposed by the mouth and thus cannot recombine letters completely at random. Therefore,

5 Figures and Tables

Cite this paper

@inproceedings{Link2004TracesOT, title={Traces of the Mouth: Andrei Andreyevich Markov’s Mathematization of Writing}, author={David Link}, year={2004} }