Learn More
In this paper, we will study the on-chip network and memory hierarchy design of the Godson-T - a homogeneous many-core processor. Godson-T has 64 cores (with private L1 cache), and 16 global L2 cache banks. All these on-chip units are connected by a 2D 8 × 8 mesh network. Our study reveals that:(a) Global on-chip L2 cache can effectively alleviate the(More)
This paper is motivated by the desire to provide an efficient and scalable software cache implementation of OpenMP on multicore and manycore architectures in general, and on the IBM CELL architecture in particular. In this paper, we propose an instantiation of the OpenMP memory model with the following advantages: (1) The proposed instantiation prohibits(More)
Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for(More)
Programming a multicore processor is difficult. It is even more difficult if the processor has software-managed memory hierarchy, e.g. the IBM Cyclops-64 (C64). A widely accepted parallel programming solution for multicore processor is OpenMP. Currently, all OpenMP directives are only used to decompose computation code (such as loop iterations, tasks, code(More)
Limits on applications and hardware technologies have put a stop to the frequency race during the 2000s. Designs now can be divided into homogeneous and heterogeneous ones. Homogeneous types are the easiest to use since most toolchains and system software do not need too much of a rewrite. On the other end of the spectrum, there are the type two(More)
This paper presents the design and implementation of a communication protocol for the IBM Cyclops-64 (C64) supercomputer system to enable reliable data transfer between the two major components of a C64 system: the C64 host system (also called C64 frontend) and the C64 compute engine (also called C64 back-end). The building block of C64 compute engine (C64(More)
GFFC (Global Feedback based Flow Control) is proposed to be used in NoC design for many-core processor. GFFC is designed based on two fundamental principles: (a) when network congestion occurs, the packet sender that causes the congestion needs to know this and needs to be proactively involved in the alleviation of this network congestion; (b) the(More)
Computer architects are now studying a new generation of chip architectures that may integrate hundreds of processing cores and memory banks on a single chip with novel interconnect technologies. A key challenge lies in the design and development of an efficient on-chip shared memory organization for these future many-core architectures. New approaches need(More)