Annotation confidence score for genome annotation: a genome comparison approach
@article{Yang2010AnnotationCS,
title={Annotation confidence score for genome annotation: a genome comparison approach},
author={Youngik Yang and Donald G. Gilbert and Sun Kim},
journal={Bioinformatics},
year={2010},
volume={26 1},
pages={
22-9
}
}MOTIVATION
The massively parallel sequencing technology can be used by small research labs to generate genome sequences of their research interest. However, annotation of genomes still relies on the manual process, which becomes a serious bottleneck to the high-throughput genome projects. Recently, automatic annotation methods are increasingly more accurate, but there are several issues. One important challenge in using automatic annotation methods is to distinguish annotation quality of ORFs…
13 Citations
Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing
- BiologyBiology
- 2020
A summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines for both annotations of structures and functions, are presented.
Experience report: issues in comparing gene function annotation in text
- Biology, Computer ScienceSIGDOC '09
- 2009
Issues in comparing genome annotation in a text format are discussed and a computational method for comparing gene annotation in text is developed that was able to handle many difficult cases (syntactically different but semantically equivalent gene function annotations) correctly.
BEACON: automated tool for Bacterial GEnome Annotation ComparisON
- BiologyBMC Genomics
- 2015
An Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON), a fast tool for an automated and a systematic comparison of different annotations of single genomes that benefits both AM developers and annotation analysers.
Gene Cluster Prediction and Its Application to Genome Annotation
- Biology
- 2011
This chapter surveys a few of the prominent techniques in gene function assignment and performs simple experiments to detect gene clusters across a given set of genomes and provides a few examples to show how gene cluster information can be applied to genome annotation and can resolve ambiguities in function assignment.
BEACON: automated tool for Bacterial
- Biology
- 2015
An Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON), a fast tool for an automated and a systematic comparison of different annotations of single genomes that benefits both AM developers and annotation analysers.
EnzymeDetector: an integrated enzyme function prediction tool and database
- Biology, Computer ScienceBMC Bioinformatics
- 2011
This tool automatically compares and evaluates the assigned enzyme functions from the main annotation databases and supplements them with its own function prediction, and provides a fast and comprehensive overview of the available enzyme function annotations for a genome of interest.
Functional coherence metrics in protein families
- Biology, Computer ScienceJ. Biomed. Semant.
- 2016
This work proposes a comprehensive approach for assessing functional coherence within protein sets using visualization and term enrichment techniques anchored in specific domain knowledge, such as a protein family, that combine aspects of semantic similarity measures andterm enrichment.
The language of gene ontology: a Zipf’s law analysis
- Computer ScienceBMC Bioinformatics
- 2011
GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provides a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
Tracing Evolutionary Footprints to Identify Novel Gene Functional Linkages
- BiologyPloS one
- 2013
TRACE is an effective new method to infer prokaryotic gene functional linkages by tracing evolutionary events by tracing gene footprints through a gene functional network constructed from 341 proKaryotic genomes.
Distribution of human genes observes Zipf's law
- Biology
- 2012
This paper investigates possible mathematical models, namely Benford‘s and Zipf”s law, to describe gene’s position distributions on human chromosomes, and suggests the analysis of gene distribution on chromosomes may contribute not only to better gene detection, but also to better Gene annotation, which is particularly relevant to high-throughput genome projects.
References
SHOWING 1-10 OF 25 REFERENCES
Evaluation of annotation strategies using an entire genome sequence
- BiologyBioinform.
- 2003
It is concluded that genome annotation may entail a considerable amount of errors, ranging from simple typographical errors to complex sequence analysis problems, and automatic systems might perform as well as the teams of experts annotating genome sequences.
Improving genome annotations using phylogenetic profile anomaly detection
- BiologyBioinform.
- 2005
A probabilistic model of phylogenetic profiles, trained from a database of curated genome annotations, can be used to reliably detect errors in new annotations and is used to identify 22 genes that were missed in previously published annotations of prokaryotic genomes.
454 sequencing put to the test using the complex genome of barley
- BiologyBMC Genomics
- 2006
The data indicate that 454 pyrosequencing allows rapid and cost-effective sequencing of the gene-containing portions of large and complex genomes and that its combination with ABI-Sanger sequencing and targeted sequence analysis can result in large regions of high-quality finished genomic sequences.
A genomic perspective on protein families.
- BiologyScience
- 1997
Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs), which comprise a framework for functional and evolutionary genome analysis.
The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions
- Biology, EngineeringNucleic Acids Res.
- 2008
The integrated microbial genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes, together with a large number of plasmids and viruses.
Identification and correction of abnormal, incomplete and mispredicted proteins in public databases
- BiologyBMC Bioinformatics
- 2008
MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors.
The use of gene clusters to infer functional coupling.
- BiologyProceedings of the National Academy of Sciences of the United States of America
- 1999
The characterization of the parameters that determine the utility of the approach are extended, and it is shown that this approach will play a significant role in supporting efforts to assign functionality to the remaining uncharacterized genes in sequenced genomes.
Research Paper: Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
- BiologyJ. Am. Medical Informatics Assoc.
- 2006
The study indicated that names for genes/proteins are highly ambiguous and there are usually multiple names for the same gene or protein and it was demonstrated that most gene/protein names appearing in text can be found in BioThesaurus.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
- BiologyNucleic acids research
- 1994
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.



