Reduction of protein sequence complexity by residue grouping.


It is well known that there are some similarities among various naturally occurring amino acids. Thus, the complexity in protein systems could be reduced by sorting these amino acids with similarities into groups and then protein sequences can be simplified by reduced alphabets. This paper discusses how to group similar amino acids and whether there is a minimal amino acid alphabet by which proteins can be folded. Various reduced alphabets are obtained by reserving the maximal information for the simplified protein sequence compared with the parent sequence using global sequence alignment. With these reduced alphabets and simplified similarity matrices, we achieve recognition of the protein fold based on the similarity score of the sequence alignment. The coverage in dataset SCOP40 for various levels of reduction on the amino acid types is obtained, which is the number of homologous pairs detected by program BLAST to the number marked by SCOP40. For the reduced alphabets containing 10 types of amino acids, the ability to detect distantly related folds remains almost at the same level as that by the alphabet of 20 types of amino acids, which implies that 10 types of amino acids may be the degree of freedom for characterizing the complexity in proteins.

Citations per Year

308 Citations

Semantic Scholar estimates that this publication has 308 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Li2003ReductionOP, title={Reduction of protein sequence complexity by residue grouping.}, author={Tanping Li and Ke Fan and Jun Wang and Wei Wang}, journal={Protein engineering}, year={2003}, volume={16 5}, pages={323-30} }