Programming Distributed Memory Sytems Using OpenMP

  title={Programming Distributed Memory Sytems Using OpenMP},
  author={Ayon Basumallik and Seung-Jai Min and Rudolf Eigenmann},
  journal={2007 IEEE International Parallel and Distributed Processing Symposium},
OpenMP has emerged as an important model and language extension for shared-memory parallel programming. On shared-memory platforms, OpenMP offers an intuitive, incremental approach to parallel programming. In this paper, we present techniques that extend the ease of shared-memory parallel programming in OpenMP to distributed-memory platforms as well. First, we describe a combined compile-time/runtime system that uses an underlying software distributed shared memory system and exploits… 

Figures from this paper

OpenMP compiler for distributed memory architectures
This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures, based on the “partially replicating shared arrays” memory model, and proposes an algorithm for shared array recognitionbased on the inter-procedural analysis, optimization technique, and communication generation technique for nonlinear references.
libMPNode: An OpenMP Runtime For Parallel Processing Across Incoherent Domains
In this work we describe libMPNode, an OpenMP runtime designed for efficient multithreaded execution across systems composed of multiple non-cache-coherent domains. Rather than requiring extensive
Control replication: compiling implicit parallelism to efficient SPMD with logical regions
Control replication is presented, a technique for generating high-performance and scalable SPMD code from implicitly parallel programs that achieves up to 99% parallel efficiency at 1024 nodes with absolute performance comparable to hand-written MPI(+X) codes.
Streams: Emerging from a Shared Memory Model
This paper presents a modest extension to OpenMP to support data partitioning and streaming, supporting both, the conventional shared memory model of OpenMP and also the transparent integration of local non-shared memory.
Sharing memory in modern distributed applications
This work proposes an object-based approach that leverages the features of modern object-oriented programming to intercept single operations on data, hiding the underlying run-time mechanism.
Design of Scalable Java Communication Middleware for Multi-Core Systems
This paper presents smdev, a shared memory communication middleware for multi-core systems. smdev provides a simple and powerful messaging application program interface that is able to exploit the
Distributive Program Parallelization Using a Suggestion Language
A suggestion-based language that enables a user to parallelize a sequential program for distributed execution by inserting hints that are safe against any type of misuse and expressive enough to specify independent, pipelined, and speculative parallel execution on a cluster of multi-core computers.
Enabling Legacy Applications on Heterogeneous Platforms
By exploiting existing mechanisms found in system software, the proposed system provides a unified compute and memory view to the application programmer, and automatically schedules compute-intensive routines found in legacy code on suitable computing resources.
Distributed shared memory based on offloading to cluster network
This paper proposes high-performance DSM, called Offloaded-DSM, in which the processes of dependency analysis and communication are offloaded to the cluster network, which reduces execution time up to 32% in eight nodes and exhibits good scalability.
Piccolo: Building Fast, Distributed Programs with Partitioned Tables
Experiments show Piccolo to be faster than existing data flow models for many problems, while providing similar fault-tolerance guarantees and a convenient programming interface.


Optimizing irregular shared-memory applications for distributed-memory systems
Combined compile-time/run-time techniques for optimizing irregular shared-memory applications on message passing systems in the context of automatic translation from OpenMP to MPI are presented.
Towards automatic translation of OpenMP to MPI
Compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems are presented and it is found that the direct translation to MPI achieves up to 30% higher scalability.
Optimizing OpenMP Programs on Software Distributed Shared Memory Systems
Comp compiler techniques that can translate standard OpenMP applications into code for distributed computer systems are described, showing that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns.
An integrated compile-time/run-time software distributed shared memory system
An integrated compile-time and run-time software DSM system to make shared memory as efficient as message passing, whether hand-coded or compiler-generated, to retain its ease of programming, and to retain the broader class of applications it supports.
Compiler and software distributed shared memory support for irregular applications
This work investigates the use of a software distributed shared memory (DSM) layer to support irregular computations on distributed memory machines and finds that it has similar performance to the inspector-executor method supported by the CHAOS run-time library, while requiring much simpler compile-time support.
Enhancing software DSM for compiler-parallelized applications
  • P. Keleher, C. Tseng
  • Computer Science
    Proceedings 11th International Parallel Processing Symposium
  • 1997
This work demonstrates a system by combining the SUIF parallelizing compiler and the CVM software DSM that combines compiler-directed techniques that combine synchronization and parallelism information communication on parallel task invocation, and employs customized routines for evaluating reduction operations.
Portable Compilers for OpenMP
This paper presents their effort to develop portable compilers for the OpenMP parallel directive language, and presents performance measurements showing that their compiler yields results comparable to those of commercial OpenMP compilers.
Compiler-directed Shared-Memory Communication for Iterative Parallel Applications
Measurements of three iterative applications show that a predictive protocol increases the number of shared-data requests satisfied locally, thus reducing the remote data access latency and total execution time.
Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment
Researchers have proposed using languages based on a global name space annotated with directives specifying how the data should be mapped onto a distributed memory machine to address the problem of difficult to program distributed-memory machines.
A synthesis of memory mechanisms for distributed architectures
The use of a combination of these mechanisms to produce a compiler code generation paradigm that can be successful for many user programs is discussed and the experimental results indicate that the new paradigm would be able to support both regular and irregular code efficiently.