Learn More
A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and(More)
Protein sequences contain surprisingly many local regions of low compositional complexity. These include different types of residue clusters, some of which contain homopolymers, short period repeats or aperiodic mosaics of a few residue types. Several different formal definitions of local complexity and probability are presented here and are compared for(More)
Computational methods based on mathematically-defined measures of compositional complexity have been developed to distinguish globular and non-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequences of high informational complexity. Sequences of known crystal structure in the(More)
Widespread use of antimalarial agents can profoundly influence the evolution of the human malaria parasite Plasmodium falciparum. Recent selective sweeps for drug-resistant genotypes may have restricted the genetic diversity of this parasite, resembling effects attributed in current debates to a historic population bottleneck. Chloroquine-resistant (CQR)(More)
Let A denote an alphabet consisting of n types of letters. Given a sequence S of length L with v(i) letters of type i on A, to describe the compositional properties and combinatorial structure of S, we propose a new complexity function of S, called the reciprocal complexity of S, as C(S) = (i=1) product operator (n) (L/nv(i))(vi) Based on this complexity(More)
Genetic investigations of malaria require a genome-wide, high-resolution linkage map of Plasmodium falciparum. A genetic cross was used to construct such a map from 901 markers that fall into 14 inferred linkage groups corresponding to the 14 nuclear chromosomes. Meiotic crossover activity in the genome proved high (17 kilobases per centimorgan) and notably(More)
Toxoplasma gondii is a highly successful protozoan parasite in the phylum Apicomplexa, which contains numerous animal and human pathogens. T.gondii is amenable to cellular, biochemical, molecular and genetic studies, making it a model for the biology of this important group of parasites. To facilitate forward genetic analysis, we have developed a(More)
Toxoplasma gondii strains differ dramatically in virulence despite being genetically very similar. Genetic mapping revealed two closely adjacent quantitative trait loci on parasite chromosome VIIa that control the extreme virulence of the type I lineage. Positional cloning identified the candidate virulence gene ROP18, a highly polymorphic serine-threonine(More)
Different local regions of natural amino acid or nucleotide sequences show remarkable heterogeneity in residue composition, reflecting diversity in evolutionary history and physiochemical constraints. Compositional complexity measures are helpful for describing and understanding this variegation. Motivated by some open problems in comparative genomics and(More)
Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using(More)