Combining Boolean Gates and Branching Programs in One Model can Lead to Faster Circuits
The reconfigurable mesh is a parallel model of computation, which exploits a massive amount of rather simple processing elements connected through a reconfigurable interconnection network. During the last decades, the model received strong interest and many researchers have devised algorithms for it. However, most of this work focuses on theoretical aspects. Due to some idealistic modeling assumptions only a few attempts have been made to implement the model and to study the practical use of reconfigurable meshes. In this paper, we leverage the reconfigurable mesh model to study potential architectures and programming models for future many-cores. We design a reconfigurable mesh in form of a scalable soft core array with a reconfigurable interconnect and implement it on FPGA technology in order to create a prototype platform. We present an overall hardware/software tool flow for generating and programming reconfigurable mesh prototypes. The new language ARMLang and a corresponding compiler facilitate the programming of the massively parallel processor arrays. To our knowledge, this work is the first practical study of word-level reconfigurable meshes. To analyze the performance of our implementation we study four algorithmic kernels from the application domains arithmetic, sorting, graph algorithms and imaging. For each kernel, we devise a reconfigurable mesh program in ARMLang, compile it to our soft core array and measure its runtime depending on the mesh size. Then, we compare the runtimes to two sequential implementations of the algorithms, which are executed on two single core systems. The results show that many-cores leveraging the reconfigurable mesh model can efficiently use a vast number of processing elements and that, for the chosen algorithms, they come close to optimally parallelized programs.