Prediction of complete gene structures in human genomic DNA.
@article{Burge1997PredictionOC, title={Prediction of complete gene structures in human genomic DNA.}, author={Christopher B. Burge and Samuel Karlin}, journal={Journal of molecular biology}, year={1997}, volume={268 1}, pages={ 78-94 } }
We introduce a general probabilistic model of the gene structure of human genomic sequences which incorporates descriptions of the basic transcriptional, translational and splicing signals, as well as length distributions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived to account for the many substantial differences in gene density and structure observed in distinct C + G compositional regions of the human genome. In addition…
4,017 Citations
Computational inference of homologous gene structures in the human genome.
- Biology, EngineeringGenome research
- 2001
A new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model, which shows an accurate and efficient automated approach for identifying genes in higher eukaryotic genomes and provide a first-level annotation of the draft human genome.
The Prediction of Human Genes in DNA Based on a Generalized Hidden Markov Model
- BiologyCCBR
- 2016
The results show that the proposed method has better performance in prediction accuracy than some existing methods, and over 70 % of exons can be identified exactly.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
- BiologyPLoS Comput. Biol.
- 2006
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene…
Targeted discovery of novel human exons by comparative genomics.
- BiologyGenome research
- 2007
A genome-wide effort to identify human genes not yet in the gene catalogs, carried out as part of the Mammalian Gene Collection project, to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR.
Gene Structure Prediction Using an Orthologous Gene of Known Exon-Intron Structure
- BiologyApplied bioinformatics
- 2004
A novel approach to predicting the exon-intron structures of mouse genes by incorporating constraints from orthologous human genes using techniques that have previously been exploited in speech and natural language processing applications is reported.
2 Gene prediction methods
- Biology
- 2008
The task in gene prediction (or genome annotation) is to determine a labeling that assigns to each base a label according to the functionality of that part of the gene, and can think of gene prediction as parsing a sequence of letters into words.
Computational methods for the identification of genes in vertebrate genomic sequences.
- BiologyHuman molecular genetics
- 1997
If the performances are satisfactory for the identification of the coding moiety of genes (internal coding exons), the determination of the full extent of the transcript (5' and 3' extremities of the gene) and the location of promoter regions are still unreliable.
Gene recognition in eukaryotic DNA by comparison of genomic sequences
- BiologyBioinform.
- 2001
A spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species, implemented in Pro-Gen software, which allows for distant comparisons.
GeneBuilder: interactive in silico prediction of gene structure
- BiologyBioinform.
- 1999
The GeneBuilder system is developed, which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases, and obtained by using a dynamic programming method.
Gene prediction with a hidden Markov model
- Biology
- 2004
A so-called generalized Hidden Markov Model (GHMM) for eukaryotic genomic sequences is introduced and the use of extrinsic information coming from EST database searches can significantly improve the prediction accuracy of gene prediction programs when combined with protein database searches.
References
SHOWING 1-10 OF 88 REFERENCES
Identification of protein coding regions in genomic DNA.
- BiologyJournal of molecular biology
- 1995
A computer program, GeneParser, which identifies and determines the fine structure of protein genes in genomic DNA sequences and can rapidly generate ranked suboptimal solutions, each of which is the optimum solution containing a given intron-exon junction is developed.
Evaluation of gene structure prediction programs.
- BiologyGenomics
- 1996
The results indicated that the predictive accuracy of the programs analyzed was lower than originally found, which indicates that the programs are overly dependent on the particularities of the examples they learn from.
Gene recognition via spliced sequence alignment.
- BiologyProceedings of the National Academy of Sciences of the United States of America
- 1996
A spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein.
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA
- Computer ScienceISMB
- 1996
A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence and provides simple solutions for integrating cardinality constraints, reading frame constraints, "indels", and homology searching.
Predicting Pol II promoter sequences using transcription factor binding sites.
- BiologyJournal of molecular biology
- 1995
A computer program, PROMOTER SCAN, has been developed to recognize a high percentage of Pol II promoter sequences while allowing only a small rate of false positives, and is now being developed for public distribution.
Large exon size does not limit splicing in vivo
- BiologyMolecular and cellular biology
- 1994
It is concluded that a limitation in exon size is not part of the exon definition mechanism, and plasmid clones containing exon inserts of defined sizes are tested.
Selection of splice sites in pre-mRNAs with short internal exons
- BiologyMolecular and cellular biology
- 1991
Model pre-mRNAs containing two introns and three exons, derived from the human beta-globin gene, were used to study the effects of internal exon length on splice site selection, suggesting that a balance between the length of the uninterrupted polypyrimidine tract and thelength of the exon is an important determinant of the relative strength of the splice sites, ensuring correct splicing patterns of multiintron pre- mRNAs.
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.
- BiologyNucleic acids research
- 1994
The precision of this approach is better than other methods and has been tested on a larger data set, and a means for predicting exon-exon junctions in cDNA sequences, which can be useful for selecting optimal PCR primers.