The problem of finding frequent patterns from graph-based datasets is an important one that finds applications in drug discovery, protein structure analysis, XML querying, and social network analysis among others. In this paper we propose a framework to mine frequent large-scale structures, formally defined as frequent <i>topological structures</i>, from… (More)
One of the largest areas of bioinformatic and data mining research has been in the protein domain. These efforts have included protein structure prediction, folding pathway prediction, sequence alignment, ab initio simulation, structure alignment, substructure detection and many others. Substructure detection is generally defined as the mining of a… (More)
Motivation: Here we present two approaches to solving the substructure similarity problem. The first approach is a parallel version of a previously developed substructure mining algorithm. The second is an algorithm that operates on the protein domain and uses a protein's amino acid sequence in addition to substructures to determine similarity.