Statistics of trinucleotides in coding sequences and evolution.


The aim of this paper is to give measurements indicative of evolutional stages of the species. Two types of statistics of trinucleotides in coding regions are analysed for 27 species. The first one is the codon space, the nucleotide ratio for each of the three codon positions. We apply principal component analysis on this space and extract two principal components faithfully describing the original distribution of the codon space. The first principal component corresponds to the GC content. The second principal component classifies the species into three evolutional groups, Archaea, Bacteria and Eukaryota. The second statistics is the real and theoretical frequency of amino acids. The real frequency of an amino acid in a coding sequence is its frequency in the translated protein. The theoretical frequency is the expected frequency calculated from the ratio of nucleotides. We introduce the discrepancy between these two frequencies as an index of non-randomness of nucleotides in the sequence. This index of non-randomness divides the species into two groups: eukaryotes having smaller non-randomness (i.e. being more random) and prokaryotes having higher non-randomness.


Citations per Year

105 Citations

Semantic Scholar estimates that this publication has 105 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Takeuchi2003StatisticsOT, title={Statistics of trinucleotides in coding sequences and evolution.}, author={Fumihiko Takeuchi and Y. Futamura and Hiroshi Yoshikura and Kenji Yamamoto}, journal={Journal of theoretical biology}, year={2003}, volume={222 2}, pages={139-49} }