Learn More
MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/O-bound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has(More)
In this paper, we contrast four approaches for Grid computing, and discuss a computer systems approach in detail. This approach views a Grid as a distributed computer system, and its main concerns are systems abstractions and constructs, such as the Grid equivalents of computer architecture, address space, process, device, file system, user/developer's(More)
The ability to find services or resources that satisfy some criteria is an important aspect of distributed systems. This paper presents an event-based architecture to support more dynamic discovery scenarios, including efficient discovery of resources whose attributes can change, and continuous monitoring for resources that satisfy a set of constraints.(More)
The Message Passing Interface (MPI) standard and its implementations (such as MPICH and OpenMPI) have been widely used in the high-performance computing area to provide an efficient communication infrastructure. This paper investigates whether MPI can be adapted to the data intensive computing area to substantially speed up Hadoop and MapReduce(More)
MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs, where(More)
Massive scale distributed database like Google's BigTable and Yahoo!'s PNUTS can be modeled as Distributed Ordered Table, or DOT, which partitions data regions and supports range queries on key. Multi-dimensional range queries on DOTs are fundamental requirements; however, none of existing schemes work well while considering three critical issues: high(More)
Virtual organizations (VO) are widely accepted in the grid and other distributed computing environments. However, there are few effective VO implementations. This paper presents a layered architecture to construct Agora, an implementation of VO. Agora manages users, resources, and agora instances, provides policies to support a DAC/MAC-hybrid cross-domain(More)
MapReduce is gaining increasing popularity as a parallel programming model for large-scale data processing. We find however some traditional MapReduce platforms have a poor performance in terms of cluster resource utilization since the traditional multi-phase parallel model and some existing schedule policies used in the cluster environment have some(More)