Learn More
MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/O-bound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has(More)
MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs, where(More)
In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the main(More)
The lomaiviticins are a family of cytotoxic marine natural products that have captured the attention of both synthetic and biological chemists due to their intricate molecular scaffolds and potent biological activities. Here we describe the identification of the gene cluster responsible for lomaiviticin biosynthesis in Salinispora pacifica strains DPJ-0016(More)
The ability to find services or resources that satisfy some criteria is an important aspect of distributed systems. This paper presents an event-based architecture to support more dynamic discovery scenarios, including efficient discovery of resources whose attributes can change, and continuous monitoring for resources that satisfy a set of constraints.(More)
In this paper, we contrast four approaches for Grid computing, and discuss a computer systems approach in detail. This approach views a Grid as a distributed computer system, and its main concerns are systems abstractions and constructs, such as the Grid equivalents of computer architecture, address space, process, device, file system, user/developer’s(More)
The Message Passing Interface (MPI) standard and its implementations (such as MPICH and OpenMPI) have been widely used in the high-performance computing area to provide an efficient communication infrastructure. This paper investigates whether MPI can be adapted to the data intensive computing area to substantially speed up Hadoop and MapReduce(More)
Massive scale distributed database like Google's BigTable and Yahoo!'s PNUTS can be modeled as Distributed Ordered Table, or DOT, which partitions data regions and supports range queries on key. Multi-dimensional range queries on DOTs are fundamental requirements; however, none of existing schemes work well while considering three critical issues: high(More)