Aaron M. Smalter

Learn More
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung(More)
The discovery of human genes that contribute to the appearance and growth of hereditary diseases is an important problem in bioinformatics research. Many techniques have been devised for classifying genes based on information from a variety of sources such as sequence and functional annotation. Recently, the use of topological information in protein-protein(More)
Clostridium botulinum type A neurotoxin (BoNT/A complex) is of great interest to the pharmaceutical industry. The drug itself is a natural complex of the toxin and a number of associated proteins. Surprisingly, relatively little is known about the exact structure and stability of the 900 kDa BoNT/A complex and its component proteins with the exception of(More)
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, <i>similarity search</i> in graph databases has emerged as an important research topic. Graph similarity search has(More)
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully(More)
Classifying objects that are sampled jointly from two or more domains has many applications. The tensor product feature space is useful for modeling interactions between feature sets in different domains but feature selection in the tensor product feature space is challenging. Conventional feature selection methods ignore the structure of the feature space(More)
In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models(More)
In this paper we propose new methods of chemical structure classification based on the integration of graph database mining from data mining and graph kernel functions from machine learning. In our method, we first identify a set of general graph patterns in chemical structure data. These patterns are then used to augment a graph kernel function that(More)
The high affinity of certain cellular polyanions for many proteins (polyanion-binding proteins (PABPs)) has been demonstrated previously. It has been hypothesized that such polyanions may be involved in protein structure stabilization, stimulation of folding through chaperone-like activity, and intra- and extracellular protein transport as well as(More)
Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical(More)