OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

@article{Richie2016OpenCLO,
  title={OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture},
  author={David A. Richie and James A. Ross},
  journal={ArXiv},
  year={2016},
  volume={abs/1608.03549}
}
There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture… 

Development and Application of a Hybrid Programming Environment on an ARM/DSP System for High Performance Computing

TLDR
A hybrid programming environment that combines OpenMP, OpenCL and MPI to enable application execution across multiple Brown-Dwarf nodes is demonstrated and results indicate that the Brown-dwarf system remains competitive with contemporary systems for memory-bound computations.

Developing Scientific Software for Low-power System-on-Chip Processors: Optimising for Energy

TLDR
A high-resolution, non-intrusive, energy measurement framework along with an Application Programming Interface (API) which enables an application to obtain real-time measurement of its energy usage at the function level.

Scratchpad-Memory Management for Multi-Threaded Applications on Many-Core Architectures

TLDR
Coordinated Data Management (CDM), a compile-time framework that automatically identifies shared/private variables and places them with replication (if necessary) to suitable on-chip or off-chip memory, taking NoC contention into consideration is presented.

Remote Procedure Calls for Improved Data Locality with the Epiphany Architecture

TLDR
The software implementation of an emerging parallel programming model for partitioned global address space (PGAS) architectures designed for the low-power Adapteva Epiphany architecture is described.

Epiphany-V: A 1024 processor 64-bit RISC System-On-Chip

This paper describes the design of a 1024-core processor chip in 16nm FinFet technology. The chip ("Epiphany-V") contains an array of 1024 64-bit RISC processors, 64MB of on-chip SRAM, three 136-bit

Architecture Emulation and Simulation of Future Many-Core Epiphany RISC Array Processors

TLDR
An Epiphany SoC device emulator is developed that can be installed as a virtual device on an ordinary x86 platform and utilized with the existing software stack used to support physical devices, thus creating a seamless software development environment capable of targeting new processor designs just as they would be interfaced on a real platform.

References

SHOWING 1-10 OF 18 REFERENCES

Programming the Adapteva Epiphany 64-core network-on-chip coprocessor

TLDR
This paper evaluates the performance of a 64-core Epiphany system with a variety of basic compute and communication micro-benchmarks and implemented two well known application kernels, 5-point star-shaped heat stencil with a peak performance of 65.2 GFLOPS and matrix multiplication with 65.3 GFLops in single precision.

Hybrid Programming Using OpenSHMEM and OpenACC

TLDR
This paper uses the NAS-BT Multi-zone benchmark that was converted to use the OpenSHMEM library API for network communication between nodes and OpenACC to exploit accelerators that are present within a node to explore the use of OpenACC directives to program GPUs and theUse of Open SHMEM, a PGAS library for onesided communication between node.

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

TLDR
The implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer shows that the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.

The Landscape of Parallel Computing Research: A View from Berkeley

TLDR
The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.

Kickstarting high-performance energy-efficient manycore architectures with Epiphany

TLDR
Epiphany is introduced as a highperformance energy-efficient manycore architecture suitable for real-time embedded systems and achieves 50 GFLOPS/W in 28 nm technology, making it suitable for high performance streaming applications like radio base stations and radar signal processing.

On-Chip Interconnection Architecture of the Tile Processor

IMesh, the tile processor architecture's on-chip interconnection network, connects the multicore processor's tiles with five 2D mesh networks, each specialized for a different use. taking advantage

A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network

  • M.B. TaylorJ. Kim A. Agarwal
  • Computer Science
    2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC.
  • 2003
TLDR
This microprocessor explores an architectural solution to scalability problems in scalar operand networks by using an on-chip point-to-point scalaroperand network to transfer operands among distributed functional units.

OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

The OpenCL standard offers a common API for program execution on systems composed of different types of computational devices such as multicore CPUs, GPUs, or other accelerators.

Thousand Core ChipsA Technology Perspective

  • S. Borkar
  • Computer Science
    2007 44th ACM/IEEE Design Automation Conference
  • 2007
TLDR
The many-core architecture, with hundreds to thousands of small cores, is presented to deliver unprecedented compute performance in an affordable power envelope and fine grain power management, memory bandwidth, on die networks, and system resiliency are discussed.

A cellular computer to implement the kalman filter algorithm

The subject of this thesis is the development of the design for a specially-organized, general-purpose computer which performs matrix operations efficiently. The content of the thesis is summarized