Identification of Ortholog Groups in KEGG/SSDB by Considering Domain Structures


Huge amount of genome information is stored in databases with the advent of recent genome projects. Although we can effectively predict protein sequences from these genomes, functions of most proteins are not experimentally determined. Therefore computational methods are most important for the function prediction, based on comparison and clustering of protein sequences. However, complications arise from the fact that the unit of conservation is not entire protein molecules but domains which are parts of the protein molecule. Hence a method to classify proteins according to their domain structures must be developed for use in functional predictions. Here, we propose a method for extracting domain information from a cluster of similar proteins obtained by all to all pairwise sequence comparisons of completely sequenced genomes.

