A GALS Infrastructure for a Massively Parallel Multiprocessor

@article{Plana2007AGI,
  title={A GALS Infrastructure for a Massively Parallel Multiprocessor},
  author={Luis A. Plana and Stephen B. Furber and Steve Temple and Muhammad Mukaram Khan and Yebin Shi and Jian Wu and Shufan Yang},
  journal={IEEE Design \& Test of Computers},
  year={2007},
  volume={24}
}
This case study focuses on a massively parallel multiprocessor for real-time simulation of billions of neurons. Every node of the design comprises 20 ARM9 cores, a memory interface, a multicast router, and two NoC structures for communicating between internal cores and the environment. The NoCs are asynchronous; the cores and RAM interfaces are synchronous. This GALS approach decouples clocking concerns for different parts of the die, leading to greater power efficiency. 

Figures and Tables from this paper

Configuring a Large-Scale GALS System

A novel asynchronous event-driven boot-up process efficiently configures the SpiNNaker chips and loads the application using a high-speed flood-fill mechanism to a system consisting of up to a million embedded processors in a robust and scalable way.

A communication infrastructure for a million processor machine

SpiNNaker (Spiking Neural Network architecture) is a massively parallel computing machine, comprising a million ARM9 cores. These are realised on 50000 chips, 20 cores/chip. While it could be classed

Globally Asynchronous Locally Synchronous Simulation of NoCs on Many-Core Architectures

This work identifies conceptual drawbacks of state of the art parallel simulation approaches and consequently proposes a novel globally asynchronous locally synchronous (GALS) simulation concept suited for many-core architectures that yields a speedup of up to 2.3 over parallel discrete event simulation.

Performance Evaluation and Scaling of a Multiprocessor Architecture Emulating Complex SNN Algorithms

The performance analysis of an efficient multiprocessor architecture that allows accelerating the emulation of large-scale Spiking Neural Networks (SNNs) is reported and it is demonstrated that the system can emulate up to 10,000 300-synapse neurons in real time at 64 MHz with conventional FPGAs.

SpiNNaker: A multi-core System-on-Chip for massively-parallel neural net simulation

The SpiNNaker multicore System-on-Chip, a Globally Asynchronous Locally Synchronous system with 18 ARM968 processor nodes residing in synchronous islands, surrounded by a light-weight, packet-switched asynchronous communications infrastructure, met power and performance requirements.

Biologically-Inspired Massively-Parallel Architectures - Computing Beyond a Million Processors

  • S. Furber
  • Computer Science
    2009 Ninth International Conference on Application of Concurrency to System Design
  • 2009
The SpiNNaker project aims to develop parallel computer systems with more than a million embedded processors. The goal of the project is to support large-scale simulations of systems of spiking

A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

A new asynchronous interconnection network is introduced for globally-asynchronous locally-synchronous (GALS) chip multiprocessors. The network eliminates the need for global clock distribution, and

Asynchronous Communications for NoCs

Technology scaling beyond 90 nm drastically complicates the chip design process, and the use of global clocking becomes very difficult for improving power and performance while at the same time keeping acceptable levels of robustness to faults, both fabrication and run time.

A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

A new asynchronous interconnection network is introduced for globally-asynchronous locally-synchronous (GALS)chip multiprocessors. The network eliminates the need for global clock distribution, and

A hierachical configuration system for a massively parallel neural hardware platform

PArtitioning and Configuration MANager provides automated hardware acceleration for some commonly used network simulators while also pointing towards the advantages of hierarchical configuration for large, domain-specific hardware systems.
...

References

SHOWING 1-10 OF 17 REFERENCES

Chain: A Delay-Insensitive Chip Area Interconnect

The increasing complexity of system-on-a-chip designs exposes the limits imposed by the standard synchronous bus, and a mixed system as a solution is proposed.

Delay-insensitive, point-to-point interconnect using m-of-n codes

This paper presents a new method for selecting suitable mappings through the decomposition of the complex m-of-n code into an incomplete m- of-N code constructed from groups of smaller, simpler m-Of-n and 1-of -n codes.

Delay-insensitive codes — an overview

It appears that delay-insensitive codes are equivalent with antichains in partially ordered sets and with all unidirectional error-detecting codes.

Neural systems engineering

The sheer scale and complexity of the human brain still defies attempts to model it in its entirety at the neuronal level, but Moore's Law is closing this gap and machines with the potential to emulate the brain are no more than a decade or so away.

org/publications/dlib

  • org/publications/dlib

Globally Asynchronous, Locally Synchronous Design and Test IEEE Design & Test of Computers Authorized licensed use limited to: IEEE Xplore

  • Globally Asynchronous, Locally Synchronous Design and Test IEEE Design & Test of Computers Authorized licensed use limited to: IEEE Xplore
  • 2008

Advanced Microcontroller Bus Architecture (AMBA) Specification, Rev. 2.0, ARM

  • Advanced Microcontroller Bus Architecture (AMBA) Specification, Rev. 2.0, ARM
  • 1999

Authorized licensed use limited to: IEEE Xplore

  • Authorized licensed use limited to: IEEE Xplore
  • 2008

Error Checking and Resetting Mechanisms for Asynchronous Interconnect,’

  • Proc. 18th UK Asynchronous Forum, University of Newcastle upon Tyne,
  • 2006

Advanced Processor Technologies Group, Room IT303

  • Advanced Processor Technologies Group, Room IT303