Gerald H. Lushington

Learn More
Graph data mining is an active research area. Graphs are general modeling tools to organize information from heterogeneous sources and have been applied in many scientific, engineering, and business fields. With the fast accumulation of graph data, building highly accurate predictive models for graph data emerges as a new challenge that has not been fully(More)
—Classifying objects that are sampled jointly from two or more domains has many applications. The tensor product feature space is useful for modeling interactions between feature sets in different domains but feature selection in the tensor product feature space is challenging. Conventional feature selection methods ignore the structure of the feature space(More)
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, <i>similarity search</i> in graph databases has emerged as an important research topic. Graph similarity search has(More)
— Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogenous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for(More)
BACKGROUND Lung cancer is the leading cause of death from cancer in the world and its treatment is dependant on the type and stage of cancer detected in the patient. Molecular biomarkers that can characterize the cancer phenotype are thus a key tool in planning a therapeutic response. A common protocol for identifying such biomarkers is to employ genomic(More)
In this paper we introduce a novel graph classification algorithm and demonstrate its efficacy in drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to create features capturing graph local topology. We design a novel graph kernel function to utilize the created feature to build predictive models(More)
BACKGROUND The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge. RESULTS A total of 3108 sequence signatures were(More)
Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical(More)
The NIH Molecular Libraries Initiative (MLI), launched in 2004 with initial goals of identifying chemical probes for characterizing gene function and druggability, has produced PubChem, a chemical genomics knowledgebase for fostering translation of basic research into new therapeutic strategies. This paper assesses progress toward these goals by evaluating(More)
Lung cancer accounts for the most cancer-related deaths. The identification of cancer-associated genes and the related pathways are essential to prevent many types of cancer. In this paper, a more systematic approach is considered. First, we did pathway analysis using Hyper Geometric Distribution (HGD) and significantly overrepresented sets of reactions(More)