Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions

  title={Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions},
  author={Yang Ruan and Geoffrey L. House and Saliya Ekanayake and Ursel Schutte and James D. Bever and Haixu Tang and Geoffrey Charles Fox},
  journal={2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing},
Phylogenetic analysis is commonly used to analyze genetic sequence data from fungal communities, while ordination and clustering techniques commonly are used to analyze sequence data from bacterial communities. However, few studies have attempted to link these two independent approaches. In this paper, we propose a method, which we call spherical phylogram (SP), to display the phylogenetic tree within the clustering and visualization result from a pipeline called DACIDR. In comparison with… Expand
Phylogenetic trees are constructed from the sequences of the different species. These are actually needed to find the relationship between the different species and also different time gaps from theExpand
Phylogenetic analysis is means of estimating evolutionary or historical relationship among group of organisms based on their genetic closeness .Phylogenetic Tree are constructed using different treeExpand
Phylogenetically Structured Differences in rRNA Gene Sequence Variation among Species of Arbuscular Mycorrhizal Fungi and Their Implications for Sequence Clustering
The results indicate that OTU-based inferences about AM fungal species composition from environmental sequences can be improved if they take these taxonomically structured patterns of sequence variation into account. Expand
Multidimensional Scaling for Genomic Data
An overview of both metric and non-metric MDS methods and their application to genomic data analyses is given. Expand
TSmap 3 D : Browser Visualization of High Dimensional Time Series Data
Large volumes of high dimensional time series data are increasingly becoming commonplace, and the ability to project such data into three dimensional space to visually inspect them is an importantExpand
TSmap3D: Browser visualization of high dimensional time series data
An MDS-based approach to project high dimensional time series data to 3D with automatic transformation to align successive data segments is presented and an open source commodity visualization of three-dimensional time series in web browser based on Three.js is presented. Expand
Product modular analysis with design structure matrix using a hybrid approach based on MDS and clustering
ABSTRACT Modular analysis using the Design Structure Matrix (DSM) identifies the interactions between groups of components, and clusters them into modules in order to achieve competitive advantagesExpand
Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes
A novel method, Entropy-Isomap, is proposed to address the issue of off-theshelf non-linear spectral dimensionality reduction methods failing for high-dimensional data sets, and is successfully applied to large data describing a fabrication process of organic materials. Expand
Dimension Reduction and Visualization of the Structure of Financial Systems
This paper describes the initial results of a study of the structure of financials markets viewed as collections of securities generating a time series of values. This study should be viewed as anExpand
A Collective Communication Layer for the Software Stack of Big Data Analytics
  • Bingjing Zhang
  • Computer Science
  • 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW)
  • 2016
In this thesis research, a distributed programming model, MapCollective, is defined so that it can be easily applied to many machine learning algorithms that fit the iterative computation model and easily parallelized with a unique collective communication layer for efficient synchronization. Expand


A general species delimitation method with applications to phylogenetic placements
The Poisson tree processes (PTP) model is introduced to infer putative species boundaries on a given phylogenetic input tree and yields more accurate results than de novo species delimitation methods. Expand
Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets
This study demonstrates the use of interpolative MDS to obtain clustering results that are qualitatively similar to those obtained through full MDS, but with substantial cost savings, and reduces the wall clock time required to cluster a set of 100,000 sequences. Expand
Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering
An unsupervised Bayesian clustering method termed Clustering 16S rRNA for OTU Prediction (CROP) is proposed that can find clusters based on the natural organization of data without setting a hard cut-off threshold (3%/5%) as required by hierarchical clustering methods. Expand
DACIDR: deterministic annealed clustering with interpolative dimension reduction using a large collection of 16S rRNA sequences
DACIDR is proposed: a parallel sequence clustering and visualization pipeline, which can address the overestimation problem along with space and time complexity issues as well as giving robust result. Expand
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data. Expand
Visualizing the protein sequence universe
A multi-dimensional scaling (MDS) implementation is described to create a 3D embedding of the Protein Sequence Universe that allows visualizing the relationships between large numbers of proteins and shows that the low-dimensional representation preserves important grouping features. Expand
MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.
The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models, inferring ancestral states and sequences, and estimating evolutionary rates site-by-site. Expand
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. Expand
Accounting for uncertainty in species delineation during the analysis of environmental DNA sequence data
The approach described here represents an objective, theory-based method for predicting species boundaries and explicitly incorporates uncertainty in the classification system into biodiversity estimation, thus allowing researchers to better address the causes and consequences of biodiversity. Expand
Sequence-based species delimitation for the DNA taxonomy of undescribed insects.
Cataloging the very large number of undescribed species of insects could be greatly accelerated by automated DNA based approaches, but procedures for large-scale species discovery from sequence dataExpand