Measuring the Coding Potential of Genomic Sequences Througha Combination of Triplet Occurrence Patterns and RNY Preference
A novel bias in codon third-letter usage was found in Escherichia coli genes with low fractions of "optimal codons", by comparing intact sequences with control random sequences. Third-letter usage has been found to be biased according to preference in codon usage and to doublet preference from the following first letter. The present study examines third-letter usage in the context of the nucleotide sequence when these preferences are considered. In order to exclude any influence by these factors, the random sequences were generated such that the amino acid sequence, codon usage, and the doublet frequency in each gene were all preserved. Comparison of intact sequences with these randomly generated sequences reveals that third letters of codons show a strong preference for the purine/pyrimidine pattern of the next codons: purine (R) is preferred to pyrimidine (Y) at the third site when followed by an R-Y-R codon, and pyrimidine is preferred when followed by an R-R-Y, an R-Y-Y or a Y-R-Y codon. This bias is probably related to interactions of tRNA molecules in the ribosome.