Automatic classification of protein structures relying on similarities between alignments
Classification of proteins 3D structures into structural families is reformulated in terms of graph based clustering of objects which are modular as similarities between two 3D structures relies on the local similarities of their matching substructures. Similarities between 3D structures are then represented as edges connecting objects in a graph. Applying clustering algorithms to such a graph results in the following drawback: subsets of more than two 3D structures belonging to the same cluster may share no similar substructure. To overcome this drawback we propose to introduce constraints about ternary similarities, <i>i.e.</i> constraints on triples of objects. The 3D structures graph is first transformed into its line graph, that represents the adjacencies between the graph edges. The ternary constraints are applied on the line graph, and a maximal line graph is then extracted from the modified line graph. The corresponding 3D structures graph now satisfies the above mentioned ternary constraints. In our experiments applying clustering on the new graph results in a more stable classification which is coherent with the expert classification SCOP.