Creating the CIPRES Science Gateway for inference of large phylogenetic trees

@article{Miller2010CreatingTC,
  title={Creating the CIPRES Science Gateway for inference of large phylogenetic trees},
  author={Mark A. Miller and Wayne Pfeiffer and Terri Schwartz},
  journal={2010 Gateway Computing Environments Workshop (GCE)},
  year={2010},
  pages={1-8}
}
Understanding the evolutionary history of living organisms is a central problem in biology. Until recently the ability to infer evolutionary relationships was limited by the amount of DNA sequence data available, but new DNA sequencing technologies have largely removed this limitation. As a result, DNA sequence data are readily available or obtainable for a wide spectrum of organisms, thus creating an unprecedented opportunity to explore evolutionary relationships broadly and deeply across the… 

Figures and Tables from this paper

Embedding CIPRES science gateway capabilities in phylogenetics software environments
TLDR
The goal in creating these services is to allow scientists to conduct analyses without leaving their preferred work environment, whether that is a complex desktop application, a set of ad hoc scripted workflows, or the existing CSG browser interface.
The CIPRES science gateway: a community resource for phylogenetic analyses
TLDR
Progress in managing the growth of this public cyberinfrastructure resource is described and the domain science that it has enabled is reviewed.
A composite genome approach to identify phylogenetically informative data from next-generation sequencing
TLDR
A novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome, called SISRS, which has the potential to transform phylogenetic research.
A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0
TLDR
Molecularevolution.org, a publicly available gateway for high-throughput, maximum-likelihood phylogenetic analysis powered by grid computing, is introduced and details about how the grid system efficiently delivers high-quality phylogenetic results are provided.
Investigating the Genomic Distribution of Phylogenetic Signal with CloudForest
TLDR
The architecture of CloudForest is described, including the advantages it provides, and it is used to investigate the distribution of phylogenetic signal along the entire X chromosome of 24 cat (Felidae) species.
The contribution of mitochondrial metagenomics to large-scale data mining and phylogenetic analysis of Coleoptera
TLDR
The combination of data mining and metagenomic sequencing of bulk samples provided the largest phylogenetic tree of Coleoptera to date, which represents a summary of existing phylogenetic knowledge and a defensible tree of great utility, in particular for studies at the intra-familial level, despite some shortcomings for resolving basal nodes.
The CIPRES science gateway: enabling high-impact science for phylogenetics researchers with limited resources
TLDR
The results indicate that the CSG is a critical and cost-effective enabler of science for phylogenetic researchers with limited resources, and is meeting an important need for computational resources in the Systematics/Evolutionary Biology community.
A new view of the tree of life
TLDR
New genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, are used to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included.
Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets
TLDR
The results provide real‐world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large‐scale phylogenomic data analyses and show how data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 30 REFERENCES
Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion
TLDR
A fast and accurate algorithm that allows ML phylogenetic searches to be performed on datasets consisting of thousands of sequences and the P-GARLI algorithm extends the approach of GARLI to allow simultaneous use of many computer processors.
Phylogenetic analyses of parasites in the new millennium.
Rec-I-DCM3: a fast algorithmic technique for reconstructing phylogenetic trees
TLDR
This paper presents a new technique called Recursive-Iterative-DCM3 (Rec-I- DCM3), which belongs to the family of disk-covering methods (DCMs), and tests this new technique on ten large biological datasets and obtained dramatic speedups as well as significant improvements in accuracy.
Multiple sequence alignment for phylogenetic purposes
I have addressed the biological rather than bioinformatics aspects of molecular sequence alignment by covering a series of topics that have been under-valued, particularly within the context of
AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis
TLDR
AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches.
TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops
TLDR
TOPALi v2 simplifies and automates the use of several methods for the evolutionary analysis of multiple sequence alignments and phylogenetic tree estimation using the Bayesian inference and maximum likelihood approaches.
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis
TLDR
A versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet that can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results.
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
TLDR
UNLABELLED RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML) that has been used to compute ML trees on two of the largest alignments to date.
A Web interface generator for molecular biology programs in Unix
TLDR
A Web interface generator for more than 150 molecular biology command-line driven programs, including: phylogeny, gene prediction, alignment, RNA, DNA and protein analysis, motif discovery, structure analysis and database searching programs.
Swami—the next generation biology workbench
TLDR
The design and implementation of The Next Generation Biology Workbench—Swami is presented, an interactive,Web-based bioinformatics analysis workbench that integrates with many popular protein and nucleic acid sequence databases and wide variety of analysis and modeling tools.
...
1
2
3
...