G. Christian Overton

Learn More
The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, "on-thefly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of(More)
Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a number of di€erent electronic data formats and accessible through a variety of interfaces and retrieval languages. These data sources include conventional relational databases with SQL interfaces, formatted text ®les on top of which(More)
Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1(More)
Scientific data of importance to biologists in the Humitn Genome Project resides not only in conventional da.tabases, but in structured files maintained in a number of different formats (e.g. ASN.1 a.nd ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional(More)
Transcription Regulatory Regions Database (TRRD) has been developed for accumulation of experimental information on the structure-function features of regulatory regions of eukaryotic genes. Each entry in TRRD corresponds to a particular gene and contains a description of structure-function features of its regulatory regions (transcription factor binding(More)
To accelerate gene discovery and facilitate genetic mapping in the protozoan parasite Toxoplasma gondii, we have generated >7000 new ESTs from the 5' ends of randomly selected tachyzoite cDNAs. Comparison of the ESTs with the existing gene databases identified possible functions for more than 500 new T. gondii genes by virtue of sequence motifs shared with(More)
MOTIVATION A protocol is described to attach expression patterns to genes represented in a collection of hybridization array experiments. Discrete values are used to provide an easily interpretable description of differential expression. Binning cutoffs for each sample type are chosen automatically, depending on the desired false-positive rate for the(More)
MOTIVATION AND RESULTS A relational schema is described for capturing highly parallel gene expression experiments using different technologies. This schema grew out of efforts to build a database for collaborators working on different biological systems and using different types of platforms in their gene expression experiments as well as different types of(More)
Transcription factors, proteins required for the regulation of gene expression, recognize and bind short stretches of DNA on the order of 4 to 10 bases in length. In general, each factor recognizes a family of "similar" sequences rather than a single unique sequence. Ultimately, the transcriptional state of a gene is determined by the cooperative(More)
We have performed a systematic analysis of gene identification in genomic sequence by similarity search against expressed sequence tags (ESTs) to assess the suitability of this method for automated annotation of the human genome. A BLAST-based strategy was constructed to examine the potential of this approach, and was applied to test sets containing all(More)