Genome annotation: from sequence to biology

@article{Stein2001GenomeAF,
  title={Genome annotation: from sequence to biology},
  author={Lincoln Stein},
  journal={Nature Reviews Genetics},
  year={2001},
  volume={2},
  pages={493-503}
}
  • L. Stein
  • Published 1 July 2001
  • Biology
  • Nature Reviews Genetics
The genome sequence of an organism is an information resource unlike any that biologists have previously had access to. But the value of the genome is only as good as its annotation. It is the annotation that bridges the gap from the sequence to the biology of the organism. The aim of high-quality annotation is to identify the key features of the genome — in particular, the genes and their products. The tools and resources for annotation are developing rapidly, and the scientific community is… 
Annotation, comparison and databases for hundreds of bacterial genomes.
Bacterial genome annotation.
TLDR
Combining structural and functional annotation across genomes in a comparative manner promotes higher levels of accurate annotation as well as an advanced understanding of genome evolution.
Annotating the Human Proteome
TLDR
The identification and functional annotation of the proteome is here of special interest and starts with the identification of genes and transcripts as a prerequisite of proteome annotation.
REVIEW OF TECHNIQUES FOR GENE SEQUENCING, ANNOTATION AND COMPARATIVE GENOMICS
TLDR
This work surveyed and presented an overview of common methods, techniques, tools and challenges of Gene Sequencing, Annotation and Comparative genomics.
Towards multidimensional genome annotation
TLDR
All four levels of genome annotation are discussed, with specific emphasis on two-dimensional annotation methods, and the study of changes in genome sequences that occur during adaptive evolution is studied.
Genomics and Proteomics Using Computational Biology
TLDR
Proteomic mass spectrometry is a method that enables sequencing of gene product fragments, enabling the validation and refinement of existing gene annotation as well as the elucidation of novel protein coding regions, but the application of proteomics data to genome annotation is hindered by the lack of suitable tools and methods.
Computer software to find genes in plant genomic DNA.
TLDR
This chapter discusses the use of different computer programs that identify protein-coding genes in large genomic sequences, and describes most commonly used gene prediction programs that are available on the World Wide Web.
A Global Approach to Comparative Genomics: Comparison of Functional Annotation over the Taxonomic Tree
TLDR
This thesis implemented a database and a client application that can be used to perform queries with a simplified language and to process and visualize the results, and developed a method for comparing GO annotations which includes a measure of functional similarity between gene products.
Gene Model Detection Using Mass Spectrometry
TLDR
A proteomics-based method for identifying open reading frames that are missed by computational algorithms and combines computationally predicted ORFs and the genome sequence with proteomics to identify novel gene models.
In silico gene Characterization and Biological Annotation of Bacillus thuringiensis Genome Sequences
TLDR
This study illustrates the importance of integrative approaches for automatic annotations of genomes of B. thuringiensis by AMIGene (Annotation of MIcrobial Genes) and FgenesB computational method and identified the Bt genes according to CDS, Transcription units and Operon.
...
...

References

SHOWING 1-10 OF 94 REFERENCES
Gene Ontology: tool for the unification of biology
TLDR
The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
The COG database: a tool for genome-scale analysis of protein functions and evolution
TLDR
The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes.
Genomic strategies to identify mammalian regulatory sequences
TLDR
In this review, several genomic approaches that are being used to identify regulatory sequences in mammalian genomes are highlighted.
Saccharomyces Genome Database.
Genome annotation assessment in Drosophila melanogaster.
TLDR
This experiment presents the first assessment of promoter prediction techniques for a significant number of genes in a large contiguous region and discovered that the promoter predictors' high false-positive rates make their predictions difficult to use.
Computational inference of homologous gene structures in the human genome.
TLDR
A new gene identification algorithm, GenomeScan, which combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model, which shows an accurate and efficient automated approach for identifying genes in higher eukaryotic genomes and provide a first-level annotation of the draft human genome.
The FlyBase database of the Drosophila genome projects and community literature.
  • Biology
    Nucleic acids research
  • 2003
TLDR
A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed and there are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.
Initial sequencing and analysis of the human genome
TLDR
The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Initial sequencing and analysis of the human genome.
TLDR
The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base
...
...