• Publications
  • Influence
Twister: a runtime for iterative MapReduce
TLDR
We present the programming model and the architecture of Twister an enhanced Map Reduce runtime that supports iterative MapReduce computations efficiently and compare it with Hadoop and DryadLINQ. Expand
  • 905
  • 67
  • PDF
Analysis of Virtualization Technologies for High Performance Computing Environments
TLDR
This paper provides an in-depth analysis of some of today's commonly accepted virtualization technologies from feature comparison to performance analysis, focusing on the applicability to High Performance Computing environments using Future Grid resources. Expand
  • 219
  • 15
  • PDF
MapReduce in the Clouds for Science
TLDR
The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very viable alternative to traditional servers and computing clusters. Expand
  • 173
  • 11
  • PDF
Total Synthesis of a Functional Designer Eukaryotic Chromosome
Designer Chromosome One of the ultimate aims of synthetic biology is to build designer organisms from the ground up. Rapid advances in DNA synthesis has allowed the assembly of complete bacterialExpand
  • 378
  • 10
  • PDF
Finding Complex Biological Relationships in Recent PubMed Articles Using Bio-LDA
The overwhelming amount of available scholarly literature in the life sciences poses significant challenges to scientists wishing to keep up with important developments related to their research, butExpand
  • 71
  • 5
  • PDF
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
TLDR
We analyze the ecosystems of thetwo prominent paradigms for data-intensive applications, hereafterreferred to as the high-performance computing and theApache-Hadoop paradigm. Expand
  • 72
  • 5
  • PDF
Scalable parallel computing on clouds using Twister4Azure iterative MapReduce
TLDR
We extend the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a fault-tolerance execution of a wide array of data mining and data analysis applications on the Azure cloud. Expand
  • 56
  • 4
  • PDF
CINET: A cyberinfrastructure for network science
TLDR
We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs network science and analysis of large graphs. Expand
  • 19
  • 4
  • PDF
Cloud computing paradigms for pleasingly parallel biomedical applications
TLDR
We present two biomedical applications, 1) assembly of genome fragments 2) dimension reduction in the analysis of chemical structures, implemented utilizing cloud infrastructure service based utility computing models of Amazon AWS and Microsoft Windows Azure as well as utilizing MapReduce based data processing frameworks, Apache Hadoop and Microsoft DryadLINQ. Expand
  • 75
  • 3
  • PDF