Learn More
A major mode of gene regulation occurs via the binding of specific proteins to specific DNA sequences. The availability of complete bacterial genome sequences offers an unprecedented opportunity to describe networks of such interactions by correlating existing experimental data with computational predictions. Of the 240 candidate Escherichia coli(More)
Mining the emerging abundance of microbial genome sequences for hypotheses is an exciting prospect of "functional genomics". At the forefront of this effort, we compared the predictions of the complete Escherichia coli genomic sequence with the observed gene products by assessing 381 proteins for their mature N-termini, in vivo abundances, isoelectric(More)
Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress(More)
New generations of DNA sequencing technologies are enabling the systematic study of genetic derangement in cancers. Sequencing of cancer exomes or transcriptomes or even entire cancer genomes are now possible, though technical and economic challenges remain. Cancer samples are inherently heterogeneous and are often contaminated with normal DNA, placing(More)
The nucleotide sequence of 1.5 Mb of genomic DNA from Mycobacterium leprae was determined using computer-assisted multiplex sequencing technology. This brings the 2.8-Mb M. leprae genome sequence to approximately 66% completion. The sequences, derived from 43 recombinant cosmids, contain 1046 putative protein-coding genes, 44 repetitive regions, 3 tRNAs,(More)
Recent advances in sequencing technology have created unprecedented opportunities for biological research. However, the increasing throughput of these technologies has created many challenges for data management and analysis. As the demand for sophisticated analyses increases, the development time of software and algorithms is outpacing the speed of(More)
We present a computational approach based on a local search strategy that discovers sets of proteins that preferentially interact with each other. Such sets are referred to as protein communities and are likely to represent functional modules. Preferential interaction between module members is quantified via an analytical framework based on a network null(More)
DGq is the alpha subunit of the heterotrimeric GTPase (G alpha), which couples rhodopsin to phospholipase C in Drosophila vision. We have uncovered three duplicated exons in dgq by scanning the GenBank data base for unrecognized coding sequences. These alternative exons encode sites involved in GTPase activity and G beta-binding, NorpA (phospholipase(More)