Tiffani L. Williams

Learn More
Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to(More)
BACKGROUND MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many(More)
Consensus trees are a popular approach for summarizing the shared evolutionary relationships in a collection of trees. Many popular techniques such as Bayesian analyses produce results that can contain tens of thousands of trees to summarize. We develop a fast consensus algorithm called HashCS to construct large-scale consensus trees. We perform an(More)
Large and comprehensive phylogenetic trees are desirable for studying macroevolutionary processes and for classification purposes. One approach for obtaining large phylogenies is to combine the topologies (or source trees) from previous phylogenetic studies. Tree reconstruction techniques that use the above methodology are known as supertree methods. In(More)
Phylogenetic trees are tree structures that depict relationships between organisms. Popular analysis techniques often produce large collections of candidate trees, which are expensive to store. We introduce TreeZip, a novel algorithm to compress phylogenetic trees based on their shared evolutionary relationships. We evaluate TreeZip's performance on(More)
Many large-scale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are limited severely by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this problem is the divide-and-conquer strategy: decompose the(More)
For centuries, the research paper have been the main vehicle for scientific progress. From the paper, readers in the scientific community are expected to extract all the relevant information necessary to reproduce and validate the results presented by the paper's authors. However, the increased use of computer software in science makes reproducing(More)
Phylogenetic analysis often produce a large number of candidate evolutionary trees, each a hypothesis of the " true " tree. Post-processing techniques such as strict consensus trees are widely used to summarize the evolutionary relationships into a single tree. However, valuable information is lost during the summarization process. A more elementary step is(More)
Trends in parallel computing indicate that heterogeneous parallel computing will be one of the most widespread platforms for computation-intensive applications. A heterogeneous computing environment ooers considerably more computational power at a lower cost than a parallel computer. We propose the Heterogeneous Bulk Synchronous Parallel (HBSP) model, which(More)