Assessment of protein sequence identity from amino acid composition data.


A new index is proposed for assessing the extent of composition divergence between two proteins of equal length. It is defined as half the sum of squares of the differences between the numbers of residues of each type in the two proteins. It is an unbiased estimator of the number of differences between the two sequences, with a coefficient of variation of about 0.4. For unrelated proteins of length N the index is expected to exceed 0.42 N in about 95 ‘A of comparisons. The index can also be defined for pairs of proteins of which one is about double the length of the other. Recent data for glucokinase and hexokinase type II, both from rats, are used to illustrate the analysis proposed, and suggest that the two sequences are about 85% identical. Of other indexes currently in use, the one proposed by Marchalonis & Weltman (1971) appears to be the most easily interpretable and is simply related to the one proposed in this paper.

1 Figure or Table


Citations per Year

129 Citations

Semantic Scholar estimates that this publication has 129 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{CornishBowden1977AssessmentOP, title={Assessment of protein sequence identity from amino acid composition data.}, author={Athel Cornish-Bowden}, journal={Journal of theoretical biology}, year={1977}, volume={65 4}, pages={735-42} }