Learn More
Phylogenomic analyses of large sets of genes or proteins have the potential to revolutionize our understanding of the tree of life. However, problems arise because estimated phylogenies from individual loci often differ because of different histories, systematic bias, or stochastic error. We have developed Concaterpillar, a hierarchical clustering method(More)
MOTIVATION Expressed sequence tag (EST) surveys are an efficient way to characterize large numbers of genes from an organism. The rate of gene discovery in an EST survey depends on the degree of redundancy of the cDNA libraries from which sequences are obtained. However, few statistical methods have been developed to assess and compare redundancies of(More)
Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary(More)
Evolutionary rates vary among sites and across the phylogenetic tree (heterotachy). A recent analysis suggested that parsimony can be better than standard likelihood at recovering the true tree given heterotachy. The authors recommended that results from parsimony, which they consider to be nonparametric, be reported alongside likelihood results. They also(More)
A confidence region for topologies is a data-dependent set of topologies that, with high probability, can be expected to contain the true topology. Because of the connection between confidence regions and hypothesis tests, implicitly or explicitly, the construction of confidence regions for topologies is a component of many phylogenetic studies. Existing(More)
Microsporidia branch at the base of eukaryotic phylogenies inferred from translation elongation factor 1alpha (EF-1alpha) sequences. Because these parasitic eukaryotes are fungi (or close relatives of fungi), it is widely accepted that fast-evolving microsporidian sequences are artifactually "attracted" to the long branch leading to the archaebacterial(More)
It has long been recognized that the rates of molecular evolution vary amongst sites in proteins. The usual model for rate heterogeneity assumes independent rate variation according to a rate distribution. In such models the rate at a site, although random, is assumed fixed throughout the evolutionary tree. Recent work by several groups has suggested that(More)
Since Darwin's Origin of Species, reconstructing the Tree of Life has been a goal of evolutionists, and tree-thinking has become a major concept of evolutionary biology. Practically, building the Tree of Life has proven to be tedious. Too few morphological characters are useful for conducting conclusive phylogenetic analyses at the highest taxonomic level.(More)
It has recently been proposed that a well-resolved Tree of Life can be achieved through concatenation of shared genes. There are, however, several difficulties with such an approach, especially in the prokaryotic part of this tree. We tackled some of them using a new combination of maximum likelihood-based methods, developed in order to practice as safe and(More)
Here, we address a much-debated topic: is there or is there not an organismal tree of gamma-proteobacteria that can be unambiguously inferred from a core of shared genes? We apply several recently developed analytical methods to this problem, for the first time. Our heat map analyses of P values and of bootstrap bipartitions show the presence of conflicting(More)