Learn More
MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/O-bound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has(More)
MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs, where(More)
In this paper, we contrast four approaches for Grid computing, and discuss a computer systems approach in detail. This approach views a Grid as a distributed computer system, and its main concerns are systems abstractions and constructs, such as the Grid equivalents of computer architecture, address space, process, device, file system, user/developer’s(More)
Massive scale distributed database like Google’s BigTable and Yahoo!’s PNUTS can be modeled as Distributed Ordered Table, or DOT, which partitions data regions and supports range queries on key. Multidimensional range queries on DOTs are fundamental requirements; however, none of existing schemes work well while considering three critical issues: high(More)
The ability to find services or resources that satisfy some criteria is an important aspect of distributed systems. This paper presents an event-based architecture to support more dynamic discovery scenarios, including efficient discovery of resources whose attributes can change, and continuous monitoring for resources that satisfy a set of constraints.(More)
The Message Passing Interface (MPI) standard and its implementations (such as MPICH and OpenMPI) have been widely used in the high-performance computing area to provide an efficient communication infrastructure. This paper investigates whether MPI can be adapted to the data intensive computing area to substantially speed up Hadoop and MapReduce(More)
In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the main(More)
Theoretically, multi-language clients invocating web services is no longer a problem due to XML-based interface descriptions by WSDL, but the reality is not so good. Some implementation level difficulties still exist when invoking web services from clients in different programming languages. These difficulties are caused by involving complex data structures(More)