Learn More
Classifying objects that are sampled jointly from two or more domains has many applications. The tensor product feature space is useful for modeling interactions between feature sets in different domains but feature selection in the tensor product feature space is challenging. Conventional feature selection methods ignore the structure of the feature space(More)
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully(More)
Although the human lung cytochrome P450 2A13 (CYP2A13) and its liver counterpart cytochrome P450 2A6 (CYP2A6) are 94% identical in amino acid sequence, they metabolize a number of substrates with substantially different efficiencies. To determine differences in binding for a diverse set of cytochrome P450 2A ligands, we have measured the spectral binding(More)
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, <i>similarity search</i> in graph databases has emerged as an important research topic. Graph similarity search has(More)
In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models(More)
Type III secretion (TTS) is an essential virulence function for Shigella flexneri that delivers effector proteins that are responsible for bacterial invasion of intestinal epithelial cells. The Shigella TTS apparatus (TTSA) consists of a basal body that spans the bacterial inner and outer membranes and a needle exposed at the pathogen surface. At the distal(More)
Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical(More)
BACKGROUND Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic(More)
BACKGROUND The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge. RESULTS A total of 3108 sequence signatures were(More)
Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical(More)