Steady progress and recent breakthroughs in the accuracy of automated genome annotation

@article{Brent2008SteadyPA,
  title={Steady progress and recent breakthroughs in the accuracy of automated genome annotation},
  author={Michael R. Brent},
  journal={Nature Reviews Genetics},
  year={2008},
  volume={9},
  pages={62-73}
}
  • M. Brent
  • Published 2008
  • Biology, Medicine
  • Nature Reviews Genetics
The sequencing of large, complex genomes has become routine, but understanding how sequences relate to biological function is less straightforward. Although much attention is focused on how to annotate genomic features such as developmental enhancers and non-coding RNAs, there is still no higher eukaryote for which we know the correct exon–intron structure of at least one ORF for each gene. Despite this uncomfortable truth, genome annotation has made remarkable progress since the first drafts… Expand
Computational Gene Prediction in Eukaryotic Genomes
TLDR
Because of the large amount of genomic data, in silico methods are needed for this genome annotation task, genome sequences are annotated using mostly computational gene prediction programs. Expand
Using comparative genome analysis to identify problems in annotated microbial genomes.
TLDR
It is discussed and demonstrated how the methods of comparative genome analysis can refine annotations by locating missing orthologues and shown that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Expand
Finding genes in genome sequence.
  • A. McHardy
  • Biology, Medicine
  • Methods in molecular biology
  • 2008
TLDR
The state of the art in automated gene finding is described and the biological basis, computational approaches, and corresponding programs that are available for the automated identification of protein-coding genes are described. Expand
Developing a bioinformatics framework for proteogenomics
TLDR
It is critically important to incorporate proteomics data into genome annotation pipelines to provide experimental protein-coding evidence, and this thesis addresses the existing gap between the use of genomic and proteomic sources for accurate genome annotation by applying a proteogenomics approach with a customised methodology. Expand
Comparative Genome Annotation.
TLDR
Methods for comparative structural genome annotation include classical approaches such as the alignment of protein sequences or protein profiles against the genome and comparative gene prediction methods that exploit a genome alignment to annotate a target genome. Expand
Similar Ratios of Introns to Intergenic Sequence across Animal Genomes
TLDR
The results indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. Expand
Approaches to Fungal Genome Annotation
TLDR
The application of the latest technologies and tools for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes are highlighted to improve the quality of predicted gene sets. Expand
The functional repertoires of metazoan genomes
  • C. Ponting
  • Biology, Medicine
  • Nature Reviews Genetics
  • 2008
TLDR
Metazoan genomes are being sequenced at an increasingly rapid rate and it is here, encoded in lineage-specific and functional sequence, that the physiological differences between species to be most concentrated. Expand
Similar ratios of introns to intergenic sequence across animal genomes
TLDR
It is shown that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, and that when large-genome invertebrates are considered, the fraction of the genome that is genes appears to be strongly predictable by genome size. Expand
Proteogenomics: Recycling Public Data to Improve Genome Annotations.
TLDR
This chapter describes a proteogenomics procedure using information from the proteome, transcriptome, and genome-thus utilizing each component of the central dogma-to annotate genetic elements in eukaryotes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 63 REFERENCES
Genome annotation past, present, and future: how to define an ORF at each locus.
  • M. Brent
  • Biology, Medicine
  • Genome research
  • 2005
TLDR
The state of gene prediction roughly 10 years ago is reviewed, the progress that has been made since is summarized, it is argued that the primary ORF identification methods so far are inadequate, and a path toward completing the Catalog of Protein Coding Genes, Version 1.0 is recommended. Expand
CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes
TLDR
This study reports a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data, and defines a set of conserved protein families that occur in a wide range of eukaryotes and presents a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence. Expand
Large-scale analysis of pseudogenes in the human genome.
Pseudogenes are considered as genomic fossils: disabled copies of functional genes that were once active in the ancient genome. Recently, whole-genome computational approaches have revealed thousandsExpand
Gene finding in the chicken genome
TLDR
De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods. Expand
Initial sequencing and comparative analysis of the mouse genome.
TLDR
The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences. Expand
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies.
TLDR
The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations. Expand
Targeted discovery of novel human exons by comparative genomics.
TLDR
A genome-wide effort to identify human genes not yet in the gene catalogs, carried out as part of the Mammalian Gene Collection project, to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. Expand
Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map.
TLDR
It is shown that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. Expand
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
TLDR
Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts. Expand
What is a gene, post-ENCODE? History and updated definition.
TLDR
This definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. Expand
...
1
2
3
4
5
...