Assembling Genes from Predicted Exons In Linear Time with Dynamic Programming

Abstract

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

DOI: 10.1089/cmb.1998.5.681

Extracted Key Phrases

7 Figures and Tables

0102030'01'03'05'07'09'11'13'15'17
Citations per Year

115 Citations

Semantic Scholar estimates that this publication has 115 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Guig1998AssemblingGF, title={Assembling Genes from Predicted Exons In Linear Time with Dynamic Programming}, author={Roderic Guig{\'o}}, journal={Journal of computational biology : a journal of computational molecular cell biology}, year={1998}, volume={5 4}, pages={681-702} }