Learn More
In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10-20% residue(More)
Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of(More)
BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the(More)
MOTIVATION Most multiple sequence alignment programs use heuristics that sometimes introduce errors into the alignment. The most commonly used methods to correct these errors use iterative techniques to maximize an objective function. We present here an alternative, knowledge-based approach that combines a number of recently developed methods into a(More)
A comprehensive investigation of ribosomal genes in complete genomes from 66 different species allows us to address the distribution of r-proteins between and within the three primary domains. Thirty-four r-protein families are represented in all domains but 33 families are specific to Archaea and Eucarya, providing evidence for specialisation at an early(More)
The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The(More)
Peroxisomes are essential organelles of eukaryotic origin, ubiquitously distributed in cells and organisms, playing key roles in lipid and antioxidant metabolism. Loss or malfunction of peroxisomes causes more than 20 fatal inherited conditions. We have created a peroxisomal database (http://www.peroxisomeDB.org) that includes the complete peroxisomal(More)
BACKGROUND In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major(More)
PipeAlign is a protein family analysis tool integrating a five step process ranging from the search for sequence homologues in protein and 3D structure databases to the definition of the hierarchical relationships within and between subfamilies. The complete, automatic pipeline takes a single sequence or a set of sequences as input and constructs a(More)