Computational genomics

  title={Computational genomics},
  author={Eugene V. Koonin},
  journal={Current Biology},
  • E. Koonin
  • Published 6 March 2001
  • Medicine
  • Current Biology
We now know how to read the sequences of nucleotide letters that comprise the genome at a rather frightening speed--a several-million-base bacterial genome in several days is not a problem for one of the sequencing centers, and a billion-base eukaryotic genome can be done in less than a year. But reading a text and understanding it are two different things. So how well can we understand the genome sequences? The answer to this question is central to the whole enterprise of genomics, and this is… Expand
Cataloguing proteins in cell cycle control
Bioinformatics makes a number of methods available that can also be used to identify cell cycle related proteins. Nevertheless, few tools are specifically designed to cope with cell cycle proteins.Expand
Unravelling the ORFan Puzzle
It is demonstrated that ORFans are an untapped source of research, requiring further computational and experimental studies, and some of the studies aimed at understanding ORFans, their functions and their origins are reviewed. Expand
Better prediction of sub‐cellular localization by combining evolutionary and structural information
This work explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs and developed two separate systems that were at its best for extra‐cellular and nuclear proteins and significantly less accurate than TargetP for mitochondrial proteins. Expand
Multiscale DNA partitioning: statistical evidence for segments
This work focuses on partitioning with respect to GC content and proposes a new approach that provides statistical error control, which is based on a statistical multiscale criterion, rendering this as a segmentation method that searches segments of any length (on all scales) simultaneously. Expand
Comparative and Evolutionary Genomics of Pseudomonas syringae
To overcome the computational challenges of large-scale comparative genome analysis, a novel comparative genomic pipeline named DeNoGAP is designed, which provides a robust computational pipeline for performing various comparative genomics tasks, such as gene prediction, ortholog prediction, functional annotation, and so on. Expand
Automatic prediction of protein function
Computational biologists have begun to develop ab initio methods that predict aspects of function, including subcellular localization, post-translational modifications, functional type and protein-protein interactions, where the most accurate approaches rely on identifying short signalling motifs, while the most general methods utilise tools of artificial intelligence. Expand
Mimicking cellular sorting improves prediction of subcellular localization.
LOCtree is introduced, a hierarchical system combining support vector machines (SVMs) and other prediction methods that predicts the subcellular compartment of a protein by mimicking the mechanism of cellular sorting and exploiting a variety of sequence and predicted structural features in its input. Expand
Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade
The results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs. Expand
Quantitative assessment of relationship between sequence similarity and function similarity
This study provides a benchmark to estimate the confidence in assignment of functions purely based on sequence similarity and quantified the correlation between functional similarity and sequence similarity measured by sequence identity or statistical significance of the alignment and compared such a correlation against randomly chosen protein pairs. Expand
Protein classification using probabilistic chain graphs and the Gene Ontology structure
Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Expand


Combining diverse evidence for gene recognition in completely sequenced bacterial genomes
A new program ORPHEUS is presented that identifies candidate genes and accurately predicts gene starts and it is shown that the program correctly identified 93.3% of experimentally annotated genes longer than 100 codons described in the PIR-International database and 92.9% of predicted starts coincided with the feature table description. Expand
Computational molecular biology - an algorithmic approach
In one of the first major texts in the emerging field of computational molecular biology, Pavel Pevzner covers a broad range of algorithmic and combinatorial topics and shows how they are connectedExpand
A genomic perspective on protein families.
Comparison of proteins encoded in seven complete genomes from five major phylogenetic lineages and elucidation of consistent patterns of sequence similarities allowed the delineation of 720 clusters of orthologous groups (COGs), which comprise a framework for functional and evolutionary genome analysis. Expand
Bioinformatics - a practical guide to the analysis of genes and proteins
  • A. Baxevanis
  • Biology, Computer Science
  • Methods of biochemical analysis
  • 1998
This work focuses on the development of novel approaches to biological analysis using Perl to Facilitate Biological Analysis and its applications in proteomics and Protein Identification. Expand
Pattern of selective constraint in C. elegans and C. briggsae genomes.
Similarity between related genomes may carry information on selective constraint in each of them. We analysed patterns of similarity between several homologous regions of Caenorhabditis elegans andExpand
Comparison of the complete protein sets of worm and yeast: orthology and divergence.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried outExpand
Initial sequencing and analysis of the human genome
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce andExpand
Predicting functions from protein sequences—where are the bottlenecks?
The exponential growth of sequence data does not necessarily lead to an increase in knowledge about the functions of genes and their products, so the identification, verification and annotation of functional features need to be drastically improved. Expand
Sequence the Human Genome
J. Craig Venter,* Mark D. Adams, Eugene W. Myers, Peter W. Li, Richard J. Mural, Granger G. Sutton, Hamilton O. Smith, Mark Yandell, Cheryl A. Evans, Robert A. Holt, Jeannine D. Gocayne, PeterExpand
Ouelette BFF: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins
  • 2001