Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments

@article{Haas2007AutomatedEG,
  title={Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments},
  author={Brian J. Haas and Steven L. Salzberg and Wei Zhu and Mihaela Pertea and Jonathan E. Allen and Joshua Orvis and Owen White and C. Robin Buell and Jennifer R. Wortman},
  journal={Genome Biology},
  year={2007},
  volume={9},
  pages={R7 - R7}
}
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure… Expand
Current methods for automated annotation of protein-coding genes.
TLDR
Software tools for gene prediction - the identification of protein-coding genes and their structure in genome sequences and current methods based on homology - comparative gene prediction and protein spliced alignments are reviewed. Expand
Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment
TLDR
Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Expand
Long Read Annotation (LoReAn): automated eukaryotic genome annotation based on long-read cDNA sequencing
TLDR
It is shown that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA sequencing data generated from either the PacBio or MinION sequencing platforms, and correctly predicting gene structure and capturing genes missed by other annotation pipelines. Expand
GeneMark-EP and -EP+: automatic eukaryotic gene prediction supported by spliced aligned proteins
TLDR
GeneMark-EP is described, a tool that utilizes another source of external information, a protein database, readily available prior to a start of a sequencing project, and demonstrates that the gene prediction accuracy is higher than one of GeneMark-ES, particularly in large eukaryotic genomes. Expand
Structural and Functional Annotation of Eukaryotic Genomes with GenSAS.
The Genome Sequence Annotation Server (GenSAS, https://www.gensas.org ) is a secure, web-based genome annotation platform for structural and functional annotation, as well as manual curation.Expand
Improving Re-annotation of Annotated Eukaryotic Genomes
TLDR
This chapter describes the annotation of splice sites, open reading frames, encoded proteins and peptides, hints for functional annotation including phylogenetic and domain analysis as well as critical evaluation of data transfer procedures, and the genome annotation process. Expand
gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
TLDR
This work presents a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments, with a flexible framework for defining gene models with reliable structural and functional attributes. Expand
FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
TLDR
FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine, which integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG and produces several files, including GO assignments, output summaries of the abovementioned programs and final annotation reports. Expand
gFACs: Filtering, analysis, and conversion to unify genome annotations across alignment and gene prediction frameworks
TLDR
gFACs is presented as a software package to filter, analyze, and convert predicted gene models and alignments with a flexible framework for defining gene models with reliable structural and functional attributes. Expand
The discrepancies in the results of bioinformatics tools for genomic structural annotation
TLDR
This work presents discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 55 REFERENCES
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.
TLDR
The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations. Expand
Computational gene prediction using multiple sources of evidence.
TLDR
Combiner, a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline, consistently outperforms even the best individual gene finder and can produce dramatic improvements in sensitivity and specificity. Expand
Annotation of the Drosophila melanogaster euchromatic genome: a systematic review
TLDR
Identification of so many unusual gene models in Drosophila suggests that some mechanisms for gene regulation are more prevalent than previously believed, and underscores the complex challenges of eukaryotic gene prediction. Expand
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
TLDR
Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene, in the context of the EGASP project. Expand
VEGA, the genome browser with a difference
  • J. Loveland
  • Computer Science, Medicine
  • Briefings Bioinform.
  • 2005
TLDR
The Vertebrate Genome Annotation (Vega) database is a community resource for browsing manual annotation from a variety of vertebrate genomes of finished sequence which is more accurate at identifying splice variants, pseudogenes poly(A) features, non-coding and complex gene structures and arrangements than current automated methods. Expand
Full-length messenger RNA sequences greatly improve genome annotation
TLDR
It is demonstrated that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. Expand
Using Multiple Alignments to Improve Gene Prediction
TLDR
N-SCAN can model the phylogenetic relationships between the aligned genome sequences, context dependent substitution rates, and insertions and deletions and exceeds that of all previously published whole-genome de novo gene predictors. Expand
Ab initio gene finding in Drosophila genomic DNA.
TLDR
Results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives, and exact gene prediction needs additional improvement using gene prediction algorithms. Expand
Apollo: a sequence annotation editor
TLDR
FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. Expand
GeneWise and Genomewise.
TLDR
Two algorithms are presented, which predicts gene structure using similar protein sequences, and Genomewise, which provides a gene structure final parse across cDNA- and EST-defined spliced structure. Expand
...
1
2
3
4
5
...