Learn More
Delaunay tessellation is applied for the first time in the analysis of protein structure. By representing amino acid residues in protein chains by C alpha atoms, the protein is described as a set of points in three-dimensional space. Delaunay tessellation of a protein structure generates an aggregate of space-filling irregular tetrahedra, or Delaunay(More)
Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. We apply a novel frequent subgraph mining algorithm to three graph representations of protein three-dimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are(More)
One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by(More)
Three-dimensional structure and amino acid sequence of proteins are related by an unknown set of rules that is often referred to as the folding code. This code is believed to be significantly influenced by nonlocal interactions between the residues. A quantitative description of nonlocal contacts requires the identification of neighboring residues. We(More)
We have developed quantitative structure-activity relationship (QSAR) models for 44 non-nucleoside HIV-1 reverse transcriptase inhibitors (NNRTIs) of the pyridinone derivative type. The k nearest neighbor (kNN) variable selection approach was used. This method utilizes multiple descriptors such as molecular connectivity indices, which are derived from(More)
We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for(More)
The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto(More)
Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors (kNN) variable(More)
Protein structural annotation and classification is an important problem in bioinformatics. We report on the development of an efficient subgraph mining technique and its application to finding characteristic substructural patterns within protein structural families. In our method, protein structures are represented by graphs where the nodes are residues(More)