Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI

  title={Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI},
  author={James A. Ross and David A. Richie and Song Jun Park and Dale R. Shires},
  journal={Proceedings of the 3rd International Workshop on Many-core Embedded Systems},
  • J. Ross, D. Richie, D. Shires
  • Published 13 June 2015
  • Computer Science
  • Proceedings of the 3rd International Workshop on Many-core Embedded Systems
The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. [] Key Method Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms…

Figures and Tables from this paper

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

ePython: An Implementation of Python for the Many-Core Epiphany Co-processor

  • Nick Brown
  • Computer Science
    2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)
  • 2016
The result of this work is support for developing Python on the Epiphany, which can be applied to other similar architectures, that the community have already started to adopt and use to explore concepts of parallelism and HPC.

Domain-Decomposition Parallelization for Molecular Dynamics Algorithm with Short-Ranged Potentials on Epiphany Architecture

This paper uses LAMMPS running on one 64-bit ARMv8 Cortex-A53 CPU core for comparing the accuracy of the results of the presented variant of the molecular dynamics algorithm for Epiphany and its computational efficiency.

Efficient parallel execution of genetic algorithms on Epiphany manycore processor

  • Lukasz FaberK. Boryczko
  • Computer Science
    2016 Federated Conference on Computer Science and Information Systems (FedCSIS)
  • 2016
This paper evaluates Parallella - a small board with the Epiphany manycore coprocessor consisting of sixteen MIMD cores connected by a mesh network-on-a-chip and achieves significant speed improvements.

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

The implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer shows that the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

  • Miguel Tasende
  • Computer Science
    2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
  • 2016
The main purpose of this work was to get closer to practical Linear Algebra aplications for the entire Parallella platform, with Scientific Computing and, in the long run, Big Data applications, in view.

Implementing Hilbert transform for Digital Signal Processing on epiphany many-core coprocessor

This paper discusses implementation of the Hilbert filter through using the COPRTHR 2.0 SDK which includes Pthread-like interface for offloading the thread function and presents timing and performance results for the implementation.

Bulk-synchronous pseudo-streaming algorithms for many-core accelerators

The bulk-synchronous parallel (BSP) model is extended to support pseudo-streaming algorithms for accelerators, and the BSP cost function is generalized to these algorithms, so that it is possible to predict the running time for programs targeting many-core accelerators and to identify possible bottlenecks.

Energy Efficiency of Epiphany Many-Core Architecture for Parallel Molecular Dynamics Calculations

Comparison of the energy consumption and performance of Parallella board with Epiphany coprocessor with a modern general-purpose processor Cortex-A53 shows the advantage of the Paral- lella platform, while there are still opportunities to improve the software.



Threaded MPI programming model for the Epiphany RISC array processor

Programming the Adapteva Epiphany 64-core network-on-chip coprocessor

This paper evaluates the performance of a 64-core Epiphany system with a variety of basic compute and communication micro-benchmarks and implemented two well known application kernels, 5-point star-shaped heat stencil with a peak performance of 65.2 GFLOPS and matrix multiplication with 65.3 GFLops in single precision.

Evaluation and improvements of programming models for the Intel SCC many-core processor

The first experiences gained while developing low-level software for message-passing and shared-memory programming on the Single-chip Cloud Computer (SCC) are detail and the potential of both programming models are evaluated and how these models can be improved especially with respect to the SCC's many-core architecture are evaluated.

Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications

P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters, and a dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration.

Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP

This paper presents a programming model, compiler and runtime system for a prototype board from STMicroelectronics featuring a ARM9 host and a STHORM many-core accelerator, based on OpenMP, with additional directives to efficiently program the accelerator from a single host program.

The 48-core SCC Processor: the Programmer's View

  • T. MattsonMichael Riepen S. Dighe
  • Computer Science
    2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2010
The programmer's view of this chip is described and RCCE is described: the native message passing model created for the SCC processor, an intermediate case, sharing traits of message passing and shared memory architectures.

A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS

This paper presents a prototype chip that integrates 48 Pentium™ class IA-32 cores on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery to realize a data-center-on-a-die microprocessor architecture.

Toward Efficient Support for Multithreaded MPI Communication

This paper presents four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity and correspondingly increasing levels of complexity, and presents performance results with a message-rate benchmark to demonstrate the performance implications of the different approaches.

RCKMPI - Lightweight MPI Implementation for Intel's Single-chip Cloud Computer (SCC)

This paper presents an MPI implementation (RCKMPI) that uses an efficient mix of MPB and DDR3 shared memory for low level communication that results in equal or lower transmission times than when communicating through the on die buffer alone.