Learn More
Protein sequences contain surprisingly many local regions of low compositional complexity. These include different types of residue clusters, some of which contain homopolymers, short period repeats or aperiodic mosaics of a few residue types. Several different formal definitions of local complexity and probability are presented here and are compared for(More)
Computational methods based on mathematically-defined measures of compositional complexity have been developed to distinguish globular and non-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequences of high informational complexity. Sequences of known crystal structure in the(More)
Let A denote an alphabet consisting of n types of letters. Given a sequence S of length L with v(i) letters of type i on A, to describe the compositional properties and combinatorial structure of S, we propose a new complexity function of S, called the reciprocal complexity of S, as C(S) = (i=1) product operator (n) (L/nv(i))(vi) Based on this complexity(More)
Toxoplasma gondii is a highly successful protozoan parasite in the phylum Apicomplexa, which contains numerous animal and human pathogens. T.gondii is amenable to cellular, biochemical, molecular and genetic studies, making it a model for the biology of this important group of parasites. To facilitate forward genetic analysis, we have developed a(More)
Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using(More)
Different local regions of natural amino acid or nucleotide sequences show remarkable heterogeneity in residue composition, reflecting diversity in evolutionary history and physiochemical constraints. Compositional complexity measures are helpful for describing and understanding this variegation. Motivated by some open problems in comparative genomics and(More)
We describe a new statistically based algorithm that aligns sequences by means of predictive inference. Using residue frequencies, this Gibbs sampling algorithm iteratively selects alignments in accordance with their conditional probabilities. The newly formed alignments in tum update an evolving residue frequency model. When equilibrium is reached the most(More)
  • Michael S. Behnke, John C. Wootton, Margaret M. Lehmann, Josh B. Radke, Olivier Lucas, Julie Nawas +2 others
  • 2010
BACKGROUND Apicomplexan parasites replicate by varied and unusual processes where the typically eukaryotic expansion of cellular components and chromosome cycle are coordinated with the biosynthesis of parasite-specific structures essential for transmission. METHODOLOGY/PRINCIPAL FINDINGS Here we describe the global cell cycle transcriptome of the(More)
Given vast quantities of molecular sequence data, and numerous different algorithms designed to discover, diagnose or model biologically interesting features in sequences, how is it possible to make objective evaluations of the diagnostic effectiveness of these algorithms and robust assessments of their relative strengths and limitations? An approach to(More)