Threaded MPI programming model for the Epiphany RISC array processor

@article{Richie2015ThreadedMP,
  title={Threaded MPI programming model for the Epiphany RISC array processor},
  author={David A. Richie and James A. Ross and Song Jun Park and Dale R. Shires},
  journal={J. Comput. Sci.},
  year={2015},
  volume={9},
  pages={94-100}
}

Figures and Tables from this paper

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI

TLDR
This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard that enables MPI codes to execute on the RISC array processor with little modification and achieve high performance.

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

Building a Parallella board cluster

TLDR
Experimental results show that the Epiphany chip performs very well compared with other energy-efficient chips such as the Cortex A9 ARM with a 11× speedup, and the cluster of four Parallella boards against an Intel i5 3570 running a single thread.

Architecture Emulation and Simulation of Future Many-Core Epiphany RISC Array Processors

TLDR
An Epiphany SoC device emulator is developed that can be installed as a virtual device on an ordinary x86 platform and utilized with the existing software stack used to support physical devices, thus creating a seamless software development environment capable of targeting new processor designs just as they would be interfaced on a real platform.

Constructing a low cost and low powered cluster with Parallella boards

TLDR
Experimental results show that the Epiphany chip performs very well compared with other energyefficient chips such as the Cortex A9 ARM with a 11× speedup, but sees a drop in speed when attempting complex arithmetic operations compared with the other processors owing to the lack of hardware support.

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

TLDR
The implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer shows that the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.

Heterogeneous Computing Platform for data processing

  • S. ProngnuchT. Wiangtong
  • Computer Science
    2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)
  • 2016
TLDR
In Parallella single board computer, PR hardware accelerator on Zynq-7000 SoC is created and compared with the uses of Epiphany 16-cores co-processor, and results show that when processing data is increasing, thePR hardware accelerator is the most promising one to run the platform efficiently.

Developing Scientific Software for Low-power System-on-Chip Processors: Optimising for Energy

TLDR
A high-resolution, non-intrusive, energy measurement framework along with an Application Programming Interface (API) which enables an application to obtain real-time measurement of its energy usage at the function level.

References

SHOWING 1-10 OF 26 REFERENCES

Programming the Adapteva Epiphany 64-core network-on-chip coprocessor

TLDR
This paper evaluates the performance of a 64-core Epiphany system with a variety of basic compute and communication micro-benchmarks and implemented two well known application kernels, 5-point star-shaped heat stencil with a peak performance of 65.2 GFLOPS and matrix multiplication with 65.3 GFLops in single precision.

A Threads-Only MPI Implementation for the Development of Parallel Programs

TLDR
A C/C++ preprocessor is provided to modify the semantics of global variables to appear as if each thread has its own address space, which makes TOMPI a true MPI implementation, that is, MPI programs can be run in TOMPI without modiication.

Realizing Efficient Execution of Dataflow Actors on Manycores

TLDR
This paper proposes to use a compilation tool with two intermediate representations for CAL to generate an Action Execution Intermediate Representation that is closer to a sequential imperative language like C and Java.

An evaluation of code generation of dataflow languages on manycore architectures

TLDR
Several optimizations in the code generation as well as in the communication library are described, and it is observed that the most critical optimization is reducing the number of external memory accesses.

Kickstarting high-performance energy-efficient manycore architectures with Epiphany

TLDR
Epiphany is introduced as a highperformance energy-efficient manycore architecture suitable for real-time embedded systems and achieves 50 GFLOPS/W in 28 nm technology, making it suitable for high performance streaming applications like radio base stations and radar signal processing.

On-Chip Interconnection Architecture of the Tile Processor

IMesh, the tile processor architecture's on-chip interconnection network, connects the multicore processor's tiles with five 2D mesh networks, each specialized for a different use. taking advantage

Learning from the Success of MPI

TLDR
This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.

A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network

  • M.B. TaylorJ. Kim A. Agarwal
  • Computer Science
    2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC.
  • 2003
TLDR
This microprocessor explores an architectural solution to scalability problems in scalar operand networks by using an on-chip point-to-point scalaroperand network to transfer operands among distributed functional units.

Are Your Passwords Safe: Energy-Efficient Bcrypt Cracking with Low-Cost Parallel Hardware

TLDR
Proposed implementations of bcrypt were integrated into John the Ripper password cracker resulting in improved energy efficiency by a factor of 35+ compared to heavily optimized implementations on modern CPUs.

A cellular computer to implement the kalman filter algorithm

The subject of this thesis is the development of the design for a specially-organized, general-purpose computer which performs matrix operations efficiently. The content of the thesis is summarized