William Noble Grundy

Learn More
MOTIVATION Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known(More)
In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence(More)
Many advanced software tools fail to reach a wide audience because they require specialized hardware, installation expertise, or an abundance of CPU cycles. The worldwide web offers a new opportunity for distributing such systems. One such program, MEME, discovers repeated patterns, called motifs, in sets of DNA or protein sequences. This tool is now(More)
In this paper we consider the problem of extracting information from the upstream untranslated regions of genes to make predictions about their transcriptional regulation. We present a method for classifying genes based on motif-based hidden Markov models (HMMs) of their promoter regions. Sequence motifs discovered in yeast promoters are used to construct(More)
We introduce a new method of functionally classifying genes using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). We describe SVMs that use different similarity metrics including a simple dot product of gene expression vectors, polynomial versions of the dot product,(More)
An important goal in bioinformatics is determining the homology and function of proteins from their sequences. Pairwise sequence similarity algorithms are often employed for this purpose. This paper describes a method for improving the accuracy of such algorithms using knowledge about families of proteins. The method requires a library of protein families(More)