Programming Distributed Memory Sytems Using OpenMP
@article{Basumallik2007ProgrammingDM, title={Programming Distributed Memory Sytems Using OpenMP}, author={Ayon Basumallik and Seung-Jai Min and Rudolf Eigenmann}, journal={2007 IEEE International Parallel and Distributed Processing Symposium}, year={2007}, pages={1-8} }
OpenMP has emerged as an important model and language extension for shared-memory parallel programming. On shared-memory platforms, OpenMP offers an intuitive, incremental approach to parallel programming. In this paper, we present techniques that extend the ease of shared-memory parallel programming in OpenMP to distributed-memory platforms as well. First, we describe a combined compile-time/runtime system that uses an underlying software distributed shared memory system and exploits…
64 Citations
OpenMP compiler for distributed memory architectures
- Computer ScienceScience China Information Sciences
- 2010
This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures, based on the “partially replicating shared arrays” memory model, and proposes an algorithm for shared array recognitionbased on the inter-procedural analysis, optimization technique, and communication generation technique for nonlinear references.
libMPNode: An OpenMP Runtime For Parallel Processing Across Incoherent Domains
- Computer SciencePMAM@PPoPP
- 2019
In this work we describe libMPNode, an OpenMP runtime designed for efficient multithreaded execution across systems composed of multiple non-cache-coherent domains. Rather than requiring extensive…
Control replication: compiling implicit parallelism to efficient SPMD with logical regions
- Computer ScienceSC
- 2017
Control replication is presented, a technique for generating high-performance and scalable SPMD code from implicitly parallel programs that achieves up to 99% parallel efficiency at 1024 nodes with absolute performance comparable to hand-written MPI(+X) codes.
Streams: Emerging from a Shared Memory Model
- Computer ScienceIWOMP
- 2008
This paper presents a modest extension to OpenMP to support data partitioning and streaming, supporting both, the conventional shared memory model of OpenMP and also the transparent integration of local non-shared memory.
Sharing memory in modern distributed applications
- Computer ScienceSAC
- 2016
This work proposes an object-based approach that leverages the features of modern object-oriented programming to intercept single operations on data, hiding the underlying run-time mechanism.
Design of Scalable Java Communication Middleware for Multi-Core Systems
- Computer ScienceComput. J.
- 2013
This paper presents smdev, a shared memory communication middleware for multi-core systems. smdev provides a simple and powerful messaging application program interface that is able to exploit the…
Distributive Program Parallelization Using a Suggestion Language
- Computer Science
- 2009
A suggestion-based language that enables a user to parallelize a sequential program for distributed execution by inserting hints that are safe against any type of misuse and expressive enough to specify independent, pipelined, and speculative parallel execution on a cluster of multi-core computers.
Enabling Legacy Applications on Heterogeneous Platforms
- Computer Science
- 2010
By exploiting existing mechanisms found in system software, the proposed system provides a unified compute and memory view to the application programmer, and automatically schedules compute-intensive routines found in legacy code on suitable computing resources.
Distributed shared memory based on offloading to cluster network
- Computer ScienceProceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing
- 2011
This paper proposes high-performance DSM, called Offloaded-DSM, in which the processes of dependency analysis and communication are offloaded to the cluster network, which reduces execution time up to 32% in eight nodes and exhibits good scalability.
Piccolo: Building Fast, Distributed Programs with Partitioned Tables
- Computer ScienceOSDI
- 2010
Experiments show Piccolo to be faster than existing data flow models for many problems, while providing similar fault-tolerance guarantees and a convenient programming interface.
References
SHOWING 1-10 OF 39 REFERENCES
Optimizing irregular shared-memory applications for distributed-memory systems
- Computer SciencePPoPP '06
- 2006
Combined compile-time/run-time techniques for optimizing irregular shared-memory applications on message passing systems in the context of automatic translation from OpenMP to MPI are presented.
Towards automatic translation of OpenMP to MPI
- Computer ScienceICS '05
- 2005
Compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems are presented and it is found that the direct translation to MPI achieves up to 30% higher scalability.
Optimizing OpenMP Programs on Software Distributed Shared Memory Systems
- Computer ScienceInternational Journal of Parallel Programming
- 2004
Comp compiler techniques that can translate standard OpenMP applications into code for distributed computer systems are described, showing that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns.
An integrated compile-time/run-time software distributed shared memory system
- Computer ScienceASPLOS VII
- 1996
An integrated compile-time and run-time software DSM system to make shared memory as efficient as message passing, whether hand-coded or compiler-generated, to retain its ease of programming, and to retain the broader class of applications it supports.
Compiler and software distributed shared memory support for irregular applications
- Computer SciencePPOPP '97
- 1997
This work investigates the use of a software distributed shared memory (DSM) layer to support irregular computations on distributed memory machines and finds that it has similar performance to the inspector-executor method supported by the CHAOS run-time library, while requiring much simpler compile-time support.
Enhancing software DSM for compiler-parallelized applications
- Computer ScienceProceedings 11th International Parallel Processing Symposium
- 1997
This work demonstrates a system by combining the SUIF parallelizing compiler and the CVM software DSM that combines compiler-directed techniques that combine synchronization and parallelism information communication on parallel task invocation, and employs customized routines for evaluating reduction operations.
Portable Compilers for OpenMP
- Computer ScienceWOMPAT
- 2001
This paper presents their effort to develop portable compilers for the OpenMP parallel directive language, and presents performance measurements showing that their compiler yields results comparable to those of commercial OpenMP compilers.
Compiler-directed Shared-Memory Communication for Iterative Parallel Applications
- Computer ScienceProceedings of the 1996 ACM/IEEE Conference on Supercomputing
- 1996
Measurements of three iterative applications show that a predictive protocol increases the number of shared-data requests satisfied locally, thus reducing the remote data access latency and total execution time.
Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment
- Computer ScienceAutomatic Parallelization
- 1994
Researchers have proposed using languages based on a global name space annotated with directives specifying how the data should be mapped onto a distributed memory machine to address the problem of difficult to program distributed-memory machines.
A synthesis of memory mechanisms for distributed architectures
- Computer ScienceICS '01
- 2001
The use of a combination of these mechanisms to produce a compiler code generation paradigm that can be successful for many user programs is discussed and the experimental results indicate that the new paradigm would be able to support both regular and irregular code efficiently.