Analytical expression of the purine/pyrimidine autocorrelation function after and before random mutations.

Abstract

The mutation process is a classical evolutionary genetic process. The type of mutations studied here is the random substitutions of a purine base R (adenine or guanine) by a pyrimidine base Y (cytosine or thymine) and reciprocally (transversions). The analytical expressions derived allow us to analyze in genes the occurrence probabilities of motifs and d-motifs (two motifs separated by any d bases) on the R/Y alphabet under transversions. These motif probabilities can be obtained after transversions (in the evolutionary sense; from the past to the present) and, unexpectedly, also before transversions (after back transversions, in the inverse evolutionary sense, from the present to the past). This theoretical part in Section 2 is a first generalization of a particular formula recently derived. The application in Section 3 is based on the analytical expression giving the autocorrelation function (the d-motif probabilities) before transversions. It allows us to study primitive genes from actual genes. This approach solves a biological problem. The protein coding genes of chloroplasts and mitochondria have a preferential occurrence of the 6-motif YRY(N)6YRY (maximum of the autocorrelation function for d = 6, N = R or Y) with a periodicity modulo 3. The YRY(N)6YRY preferential occurrence without the periodicity modulo 3 is also observed in the RNA coding genes (ribosomal, transfer, and small nuclear RNA genes) and in the noncoding genes (introns and 5' regions of eukaryotic nuclei). However, there are two exceptions to this YRY(N)6YRY rule: the protein coding genes of eukaryotic nuclei, and prokaryotes, where YRY(N)6YRY has the second highest value after YRY(N)0YRY (YRYYRY) with a periodicity modulo 3. When we go backward in time with the analytical expression, the protein coding genes of both eukaryotic nuclei and prokaryotes retrieve the YRY(N)6YRY preferential occurrence with a periodicity modulo 3 after 0.2 back transversions per base. In other words, the actual protein coding genes of chloroplasts and mitochondria are similar to the primitive protein coding genes of eukaryotic nuclei and prokaryotes. On the other hand, this application represents the first result concerning the mutation process in the model of DNA sequence evolution we recently proposed. According to this model, the actual genes on the R/Y alphabet derive from two successive evolutionary genetic processes: an independent mixing of a few nonrandom types of oligonucleotides leading to genes called primitive followed by a mutation process in these primitive genes.(ABSTRACT TRUNCATED AT 400 WORDS)

Cite this paper

@article{Arqus1994AnalyticalEO, title={Analytical expression of the purine/pyrimidine autocorrelation function after and before random mutations.}, author={Didier Arqu{\`e}s and Christian J. Michel}, journal={Mathematical biosciences}, year={1994}, volume={123 1}, pages={103-25} }