Evaluation of gene structure prediction programs.

@article{Burset1996EvaluationOG,
  title={Evaluation of gene structure prediction programs.},
  author={M. Burset and R. Guig{\'o}},
  journal={Genomics},
  year={1996},
  volume={34 3},
  pages={
          353-67
        }
}
We evaluate a number of computer programs designed to predict the structure of protein coding genes in genomic DNA sequences. Computational gene identification is set to play an increasingly important role in the development of the genome projects, as emphasis turns from mapping to large-scale sequencing. The evaluation presented here serves both to assess the current status of the problem and to identify the most promising approaches to ensure further progress. The programs analyzed were… Expand

Tables and Topics from this paper

An assessment of gene prediction accuracy in large DNA sequences.
TLDR
Though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, there is a long way to go before the authors can decipher the precise exonic structure of every gene in the human genome using purely computational methodology. Expand
Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences
TLDR
Three methods were developed, each having some advantages over the other two, and each of them offering higher prediction accuracy on the test dataset than any gene-finding program currently available. Expand
10 – Prediction of Human Gene Structure
TLDR
The most important aspects of gene structure prediction are described: functional sites in nucleotide sequence, functional regions inucleotide sequences, protein-coding gene structure predicted, analysis of potential proteins coded by predicted genes, and RNA-c coding gene structure Prediction. Expand
Evaluation of gene-finding programs on mammalian sequences.
TLDR
This analysis shows that the new generation of programs has substantially better results than the programs analyzed in previous studies, and pinpoints the strengths and weaknesses of each individual program as well as those of computational gene-finding in general. Expand
Computational methods for the identification of genes in vertebrate genomic sequences.
  • J. Claverie
  • Biology, Medicine
  • Human molecular genetics
  • 1997
TLDR
If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5' and 3' extremities of the gene) and the location of promoter regions are still unreliable. Expand
Sequence Similarity Based Gene Prediction
TLDR
This chapter evaluates the accuracy of gene predictions derived exclusively from sequence similarity database searches, in particular of the ability of these methods to correctly infer the exonic structure of the genes in higher eukariotic organisms. Expand
A new approach for gene prediction using comparative sequence analysis
TLDR
This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison and proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. Expand
Comparative Gene Prediction Based on Gene Structure Conservation
TLDR
A program, GeneAlign, which predicts the genes on one sequence by measuring the similarity between the predicted sequence and related genes annotated on another genome, and applies CORAL, a heuristic linear time alignment tool, to determine whether the regions flanked by candidate signals are similar with the annotated exons or not. Expand
Gene annotation: prediction and testing.
TLDR
A highly curated human gene-set made publicly available will be a great asset for the experimental community and for future comparative genome projects. Expand
Homology-based gene prediction using neural nets.
TLDR
GIN is able to recognize multiple genes within genomic DNA as demonstrated by the identification of a globin gene (gamma-globin-1(G)) that has not been annotated as a coding region in the widely used the test set of Burset and Guigo. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 82 REFERENCES
Identification of protein coding regions in genomic DNA.
TLDR
A computer program, GeneParser, which identifies and determines the fine structure of protein genes in genomic DNA sequences and can rapidly generate ranked suboptimal solutions, each of which is the optimum solution containing a given intron-exon junction is developed. Expand
Computational gene identification
  • R. Guigó
  • Biology, Medicine
  • Journal of Molecular Medicine
  • 1997
TLDR
With increasing frequency the DNA sequence of a large region of the human genome is known before the biologically relevant features that encode – protein-coding genes, in particular – have been fully characterized, and characterization by computational analysis is substantially less expensive and costly than by experimental means. Expand
Constructing gene models from accurately predicted exons: an application of dynamic programming
TLDR
The optimal gene models constructed by GAP III correspond very well with the structures of genes which have been determined experimentally and reported in the Genome Sequence Database (GSDB). Expand
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.
TLDR
It was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. Expand
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.
TLDR
Dynamic programming (DP) is applied to the problem of precisely identifying internal exons and introns in genomic DNA sequences and the program GeneParser employs the DP algorithm to enforce the constraints that introns and exons must be adjacent and non-overlapping and finds the highest scoring combination of intron and exon subject to these constraints. Expand
Improved tools for biological sequence comparison.
  • W. Pearson, D. Lipman
  • Biology, Medicine
  • Proceedings of the National Academy of Sciences of the United States of America
  • 1988
TLDR
Three computer programs for comparisons of protein and DNA sequences can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. Expand
QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification
TLDR
The sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with theInformation obtained from sequence similarity, is presented and assessed. Expand
Identification of protein coding regions by database similarity search
TLDR
The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step and was characterized as appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors. Expand
Computer prediction of the exon-intron structure of mammalian pre-mRNAs.
  • M. Gelfand
  • Biology, Medicine
  • Nucleic acids research
  • 1990
A novel approach to the problem of prediction of protein-coding regions is suggested. This approach combines the site prediction methods to predict splicing sites and the global coding regionExpand
Evaluation of the exon predictions of the GRAIL software.
TLDR
The GRAIL 2 software predicted as excellent a significantly higher proportion of the coding exons than did GRAil 1, and also gave a significantly lower figure for false predictions. Expand
...
1
2
3
4
5
...