Compiling programs for distributed-memory multiprocessors

  title={Compiling programs for distributed-memory multiprocessors},
  author={David Callahan and Ken Kennedy},
  journal={The Journal of Supercomputing},
We describe a new approach to programming distributed-memory computers. Rather than having each node in the system explicitly programmed, we derive an efficient message-passing program from a sequential shared-memory program annotated with directions on how elements of shared arrays are distributed to processors. This article describes one possible input language for describing distributions and then details the compilation process and the optimization necessary to generate an efficient program… 


A novel approach to the problem of automatic data partitioning by introducing the notion of constraints on data distribution, and showing how a parallelizing compiler can infer those constraints by looking at the data reference patterns in the source code of the program.


A novel approach to the problem of automatic data partitioning by introducing the notion of constraints on data distribution, and showing how a parallelizing compiler can infer those constraints by looking at the data reference patterns in the source code of the program.

An introduction to compilation issues for parallel machines

Although tremendous advances have been made in dependence theory and in the development of a “toolkit” of transformations, parallel systems are used most effectively when the programmer interacts in the optimization process.

Compiler technology for machine-indepenent parallel programming

  • K. Kennedy
  • Computer Science
    International Journal of Parallel Programming
  • 2007
Historically, the principal achievement of compiler technology has been to make it possible to program in a high-level, machine-independent style. The absence of compiler technology to provide such a

Tools for Developing and Analyzing Parallel For

This paper discusses several compile-time optimization techniques used in PYRROS and the related issues of partitioning and "owner computes rule" are discussed and the importance of program scheduling is demonstrated.

An array language for data parallelism: Definition, compilation, and applications

This work proposes an array language that captures many of the abstractions that are necessary for the effective programming of SIMD machines, thereby liberating the user from having to specify low-level details, and allows for efficient compilation using state-of-the-art techniques, achieving hand-code quality.

Automating the Coordination of Interprocessor Communication October 29 , 1990

This paper presents methods for ensuring correct synchronization and scheduling of message-passing in the context of compiling shared-memory programs onto distributed-memory machines. We show that

Pattern Driven Automatic Parallelization

This paper describes a knowledge based system for automatic parallelization of a wide class of sequential numeric codes operating on vectors and dense matrices and for execution on distributed memory


A new programming environment for distributed memory architectures is presented, providing a global name space and allowing direct access to remote parts of data values and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes is presented.

Compiling Fortran D Single-instruction, Multiple-data

This work proposes to solve the problem of programming parallel machines by developing the compiler technology needed to establish a machine-independent programming model that must be easy to use, yet perform with acceptable efficiency on different parallel architectures, at least for data-parallel scientific codes.



An Overview of Dino - A New Language for Numerical Computation on Distributed Memory Multiprocessors

The authors' approach is to add several high-level constructs to the standard C programming language that allows the programmer to describe the parallel algorithm to the computer in a natural way, similar to the way in which the algorithm designer might informally describe the algorithm.

Programming for Parallelism

  • A. Karp
  • Computer Science
  • 1987
In the last few years we have seen an explosion in the interest in and availability of parallel processors and a corresponding expansion in applications programming activity. Clearly, applications

Automatic decomposition of scientific programs for parallel execution

An algorithm for transforming sequential programs into equivalent parallel programs is presented and the problem of generating optimal code when loop interchange is employed is shown to be intractable.

SUPERB: A tool for semi-automatic MIMD/SIMD parallelization

The S/Net's Linda kernel

The implementation suggests that Linda's unusual shared-memory-like communication primitives can be made to run well in the absence of physically shared memory; the simplicity of the language and of the implementation's logical structure suggest that similar Linda implementations might readily be constructed on related architectures.

Advanced compiler optimizations for supercomputers

Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code to be able to operate on parallel systems.

Parallel Programming Support in ParaScope

The first vector supercomputers appeared on the market in the early to mid seventies. Yet, because of the lag in developing supporting software, it is only recently that vectorizing compilers

Domain Decomposition in Distributed and Shared Memory Environments. I: A Uniform Decomposition and Performance Analysis for the NCUBE and JPL Mark IIIfp Hypercubes

We describe how explicit domain decomposition can lead to implementations of large scale scientific applications which run with near optimal performance on concurrent supercomputers with a variety of

Automatic translation of FORTRAN programs to vector form

The theoretical background is developed here for employing data dependence to convert FORTRAN programs to parallel form and transformations that use dependence to uncover additional parallelism are discussed.