MRPC: A High Performance RPC System for MPMD Parallel Computing

  title={MRPC: A High Performance RPC System for MPMD Parallel Computing},
  author={Chi-Chao Chang and G. Czajkowski and T. V. Eicken},
  journal={Softw. Pract. Exp.},
MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high-performance multi-computers, limiting the appeal of RPC-based languages in the parallel computing community. MRPC combines the efficient control and data transfer provided by Active Messages (AM) with a minimal multithreaded runtime system that extends AM with the features required to support MPMD. This approach introduces… Expand
Evaluating the Performance Limitations of MPMD Communication
This paper investigates the fundamental limitations of MPMD communication using a case study of two parallel programming languages, Compositional C++ (CC++) and Split-C, that provide support for a global name space and suggests that RPC-based communication can be used effectively in many high-performance MPMD parallel applications. Expand
Programming Support for MPMD Parallel Computing in ClusterGOP
This paper describes how ClusterGOP supports programming of MPMD parallel applications on top of MPI, and discusses the issues of implementing the MPMD model in ClusterGOP using MPI and evaluates the performance by using example applications. Expand
GrADSolve a grid-based RPC system for parallel computing with application-level scheduling
Experiments are presented to prove that GrADSolve's data staging mechanisms can significantly reduce the overhead associated with data movement in current RPC systems and to demonstrate the usefulness of utilizing the execution traces maintained by GrADSolving for problem solving. Expand
Safe and efficient cluster communication in java using explicit memory management
In-place object de- serialization—de-serialization without allocation and copying of objects—is proposed to further enhance the performance of RMI on homogeneous clusters and takes advantage of the zerocopy capabilities of network devices to reduce the per-object de-serialized costs to a constant irrespective of object size. Expand
Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing
Ninf-G is a reference implementation of the GridRPC API which has been proposed for standardization at the Global Grid Forum and provides a simple and easy programming interface based on standard Grid protocols and the API for Grid Computing. Expand
A Preemption-Based Meta-Scheduling System for Distributed Computing
The scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. Expand
A GridRPC Model and API for End-User Applications
The goal of this document is to clearly and unambiguously define the syntax and semantics for GridRPC, thereby enabling a growing user base to take an advantage of multiple implementations, and to facilitate the development ofmultiple implementations. Expand
Overview of GridRPC: A Remote Procedure Call API for Grid Computing
Initial work on GridRPC shows that client access to existing grid computing systems such as NetSolve and Ninf can be unified via a common API, a task that has proven to be problematic in the past. Expand
Compiler and runtime support for the execution of scientific codes with unstructured datasets on heterogeneous parallel architectures
A methodology to reduce the size of the datasets and transfer them efficiently to the FPGA, as well as two compiler and runtime techniques to automate the parallelization of the codes suitable for heterogeneous systems, one focused on control flow distribution and another one based on pipelining of loop sequences are proposed. Expand
Optical interconnects in systems
  • A. Levi
  • Engineering
  • Proceedings of the IEEE
  • 2000
Future enhancement of system performance will decreasingly rely on reduction in transistor dimensions. Rather, performance gains will increasingly come from improved hardware and softwareExpand


CRL: high-performance all-software distributed shared memory
Results from the first completely controlled comparison of scalable hardware and software DSM systems are presented and indicate that CRL is capable of delivering performance that is competitive with hardware DSM systems and suggest that in many cases special-purpose hardware support for shared memory may not be necessary. Expand
Performance implications of communication mechanisms in all-software global address space systems
This study compares the mechanisms in two representative all-software systems: CRL and Split-C and finds the programming complexity of the communication mechanisms in both languages to be comparable. Expand
Low-Latency Communication on the IBM RISC System/6000 SP
An implementation of Active Messages (SP AM) which is layered directly on top of the SP's network adapter (TB2) and the MPI implementation is based on the freely available MPICH version and achieves performance equivalent to IBM's MPI-F on the NAS benchmarks. Expand
Performance of a High-Level Parallel Language on a High-Speed Network
This paper has implemented a portable runtime system for an object-based language (Orca) on a collection of processors connected by a Myrinet network, and optimized message handling, multicasting, buffer management, fragmentation, marshalling, and various other issues. Expand
Performance of Firefly RPC
This paper reports on the performance of the remote procedure call (RPC) implementation for the Firefly multiprocessor and analyzes the implementation to account precisely for all measured latency and estimates how much faster RPC could be if certain improvements were made. Expand
Active Messages: A Mechanism for Integrated Communication and Computation
It is shown that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed and, with this mechanism, latency tolerance becomes a programming/compiling concern. Expand
The PVM Concurrent Computing System: Evolution, Experiences, and Trends
The architecture of PVM system is described, and its computing model, the programming interface it supports, auxiliary facilities for process groups and MPP support, and some of the internal implementation techniques employed are discussed. Expand
Experience with active messages on the Meiko CS-2
This paper's implementation of active messages results in a one-way latency of 12.3 /spl mu/s and achieves up to 39 MB/s for bulk transfers which are close to optimal for the current Meiko hardware and are competitive with performance of active message on other hardware platforms. Expand
User-level interprocess communication for shared memory multiprocessors
User-Level Remote Procedure Call (URPC) combines a fast cross-address space communication protocol using shared memory with lightweight threads managed at the user level, which allows the kernel to be bypassed during cross- address space communication. Expand
Improving IPC by kernel design
The main ideas are to guide the complete kernel design by the ipc requirements, and to make heavy use of the concept of virtual address space inside the μ-kernel itself. Expand