The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

  title={The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition},
  author={Ashkan Tousimojarad and Wim Vanderbauwhede},
We present the Glasgow Parallel Reduction Machine (GPRM), a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the… 

Figures from this paper

GPRM : a high performance programming framework for manycore processors

A new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM), which provides high performance while maintaining ease of programming and a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations.

A Parallel Task-Based Approach to Linear Algebra

This paper highlights some of the drawbacks in the OpenMP tasking approach, and proposes an alternative model based on the Glasgow Parallel Reduction Machine (GPRM) programming framework, which is deployed to solve a fundamental linear algebra problem, LU factorisation of sparse matrices.

Comparison of Three Popular Parallel Programming Models on the Intel Xeon Phi

This study chooses the Intel Xeon Phi system as a modern platform to explore how popular parallel programming models, namely OpenMP, Intel Cilk Plus and Intel TBB (Threading Building Blocks) scale on manycore architectures.

Steal Locally, Share Globally

This paper proposes a task-based strategy called “Steal Locally, Share Globally” implemented in the runtime of the parallel programming model GPRM (Glasgow Parallel Reduction Machine), and shows that G PRM not only performs well for single workloads, but also outperforms the other models for multiprogramming workloads.

Compiling Vector Pascal to the XeonPhi

The techniques used to port the Glasgow Vector Pascal Compiler to this architecture are described and its performance is assessed by comparisons of the XeonPhi with 3 other machines running the same algorithms.

Number of Tasks, not Threads, is Key

This paper compares a purely task-centric parallel programming model called GPRM with three popular approaches (OpenMP, Intel Cilk Plus, and TBB) on two modern many core systems, the Tilera TILEPro64 and Intel Xeon Phi, which have respectively 64 and 60 physical cores integrated into a single chip.



Intel threading building blocks - outfitting C++ for multi-core processor parallelism

This guide explains how to maximize the benefits of multi-core chips through a portable C++ library that works on Windows, Linux, Macintosh, and Unix systems, and reveals the gotchas in TBB.

Shared Memory, Message Passing, and Hybrid Merge Sorts for Standalone and Clustered SMPs

The first ones to concurrently experiment with - and compare - shared memory, message passing, and hybrid merge sort are investigated, which can help in the parallelization of specific practical merge sort routines and, even more important, in the practical parallelized of other divide-and-conquer algorithms for mainstream SMP-based systems.

A dependency-aware task-based programming environment for multi-core architectures

A programming model for those environments based on automatic function level parallelism that strives to be easy, flexible, portable, and performant is presented and it is demonstrated that it offers reasonable performance without tuning, and that it can rival highly tuned libraries with minimal tuning effort.


Design considerations of a coarse grain parallel architecture for functional languages are presented. These include extensibility, the separation of computation and control of parallelism, the

Chapel , Fortress and X10 : novel languages for HPC

At the time of writing, all three languages are still in early development stages and any available compilers are experimental, so the report does not touch on language or code performance.

Scheme: A Interpreter for Extended Lambda Calculus

A completely annotated interpreter for SCHEME, written in MacLISP, is presented to acquaint programmers with the tricks of the trade of implementing non-recursive control structures in a recursive language like LISP.

Types and programming languages

This text provides a comprehensive introduction both to type systems in computer science and to the basic theory of programming languages, with a variety of approaches to modeling the features of object-oriented languages.

The Design of OpenMP Tasks

The paper summarizes the efforts of the sub-committee in designing, evaluating and seamlessly integrating the tasking model into the OpenMP specification, and compares a prototype implementation of thetasking model with existing models, and evaluates it on a wide range of applications.

The parallel graph reduction machine, Alice

This work believes that functional languages provide the most effective means for producing software and that the right approach is to develop a customised architecture for the implementation of the most suitable computational model for these languages.

A Comparison of some recent Task-based Parallel Programming Models

The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cil