Prediction of complete gene structures in human genomic DNA.

  title={Prediction of complete gene structures in human genomic DNA.},
  author={Christopher B. Burge and Samuel Karlin},
  journal={Journal of molecular biology},
  volume={268 1},
We introduce a general probabilistic model of the gene structure of human genomic sequences which incorporates descriptions of the basic transcriptional, translational and splicing signals, as well as length distributions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived to account for the many substantial differences in gene density and structure observed in distinct C + G compositional regions of the human genome. In addition… 

Figures and Tables from this paper

Computational inference of homologous gene structures in the human genome.

A new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model, which shows an accurate and efficient automated approach for identifying genes in higher eukaryotic genomes and provide a first-level annotation of the draft human genome.

The Prediction of Human Genes in DNA Based on a Generalized Hidden Markov Model

The results show that the proposed method has better performance in prediction accuracy than some existing methods, and over 70 % of exons can be identified exactly.

A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene

Targeted discovery of novel human exons by comparative genomics.

A genome-wide effort to identify human genes not yet in the gene catalogs, carried out as part of the Mammalian Gene Collection project, to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR.

Gene Structure Prediction Using an Orthologous Gene of Known Exon-Intron Structure

A novel approach to predicting the exon-intron structures of mouse genes by incorporating constraints from orthologous human genes using techniques that have previously been exploited in speech and natural language processing applications is reported.

2 Gene prediction methods

The task in gene prediction (or genome annotation) is to determine a labeling that assigns to each base a label according to the functionality of that part of the gene, and can think of gene prediction as parsing a sequence of letters into words.

Computational methods for the identification of genes in vertebrate genomic sequences.

If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5' and 3' extremities of the gene) and the location of promoter regions are still unreliable.

Gene recognition in eukaryotic DNA by comparison of genomic sequences

A spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species, implemented in Pro-Gen software, which allows for distant comparisons.

GeneBuilder: interactive in silico prediction of gene structure

The GeneBuilder system is developed, which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases, and obtained by using a dynamic programming method.

Gene prediction with a hidden Markov model

A so-called generalized Hidden Markov Model (GHMM) for eukaryotic genomic sequences is introduced and the use of extrinsic information coming from EST database searches can significantly improve the prediction accuracy of gene prediction programs when combined with protein database searches.



Identification of protein coding regions in genomic DNA.

A computer program, GeneParser, which identifies and determines the fine structure of protein genes in genomic DNA sequences and can rapidly generate ranked suboptimal solutions, each of which is the optimum solution containing a given intron-exon junction is developed.

Prediction of gene structure.

Evaluation of gene structure prediction programs.

The results indicated that the predictive accuracy of the programs analyzed was lower than originally found, which indicates that the programs are overly dependent on the particularities of the examples they learn from.

Gene recognition via spliced sequence alignment.

A spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein.

A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA

A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence and provides simple solutions for integrating cardinality constraints, reading frame constraints, "indels", and homology searching.

Predicting Pol II promoter sequences using transcription factor binding sites.

A computer program, PROMOTER SCAN, has been developed to recognize a high percentage of Pol II promoter sequences while allowing only a small rate of false positives, and is now being developed for public distribution.

Large exon size does not limit splicing in vivo

It is concluded that a limitation in exon size is not part of the exon definition mechanism, and plasmid clones containing exon inserts of defined sizes are tested.

Selection of splice sites in pre-mRNAs with short internal exons

Model pre-mRNAs containing two introns and three exons, derived from the human beta-globin gene, were used to study the effects of internal exon length on splice site selection, suggesting that a balance between the length of the uninterrupted polypyrimidine tract and thelength of the exon is an important determinant of the relative strength of the splice sites, ensuring correct splicing patterns of multiintron pre- mRNAs.

Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.

The precision of this approach is better than other methods and has been tested on a larger data set, and a means for predicting exon-exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.