Searching the Expressed Sequence Tag (EST) Databases: Panning for Genes

Abstract

The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

DOI: 10.1093/bib/1.1.76

Extracted Key Phrases

Statistics

0102030'04'06'08'10'12'14'16
Citations per Year

121 Citations

Semantic Scholar estimates that this publication has 121 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Jongeneel2000SearchingTE, title={Searching the Expressed Sequence Tag (EST) Databases: Panning for Genes}, author={C. Victor Jongeneel}, journal={Briefings in bioinformatics}, year={2000}, volume={1 1}, pages={76-92} }