Heiner Giefers

Learn More
Finite difference methods are widely used, highly parallel algorithms for solving differential equations. However, the algorithms are memory bound and thus difficult to implement efficiently on CPUs or GPUs. In this work we study the implementation of the finite difference time domain (FDTD) method for solving Maxwell's equations on an FPGA-based Maxeler(More)
Networks-on-chip (NoC) are very efficient for point-to-point communication but are also known to provide poor broadcast and multicast performance. In this paper, we propose a triple hybrid interconnect for many-cores, consisting of a reconfigurable mesh network and a wormhole routed NoC for data communication, and a barrier network for synchronization. On(More)
The reconfigurable mesh is a model for massively parallel computing for which many algorithms with very low complexity have been developed. These algorithms execute cycles of bus configuration, communication, and constant-time computation on all processing elements in a lock-step. In this paper, we investigate the use of reconfigurable meshes as(More)
Extracting information from unstructured text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of textual data. Therefore we discuss the use of FPGAs to perform large scale text analytics. We present a framework consisting of a compiler and an operator library capable of generating a(More)
The reconfigurable mesh is a very popular model for massively parallel computation for which a large body of algorithms with exceptionally low runtime complexities exists. However, these low complexities can not be exploited due to the unrealistic assumption that communication time is either constant or logarithmic in the number of cores. Nevertheless,(More)
The reconfigurable mesh serves as a theoretical model for massively parallel computing, but has recently been investigated as a practical architecture for many-cores with light-weight, circuit-switched interconnects. There is a lack of programming environments, including languages, compilers, and debuggers for reconfigurable meshes. In this paper, we(More)
The energy efficiency of computer systems can be increased by migrating computational kernels that are known to under-utilize the CPU to an FPGA based coprocessor. In contrast to traditional I/O-based coprocessors that require explicit data movement, coherently attached accelerators can operate on the same virtual address space than the host CPU. A shared(More)
In this paper, we present an approach for low--power driven synthesis based on local frequency/voltage scaling. During the scheduling phase of the High--Level Synthesis (HLS) the design is partitioned into different frequency/voltage islands. Operators within these islands, are encapsulated by wrappers to ensure correct dataflow between the islands. A(More)
Hardware accelerators have evolved as the most prominent vehicle to meet the demanding performance and energy-efficiency constraints of modern computer systems. The prevalent type of hardware accelerators in the high-performance computing domain are PCIe attached co-processors to which the CPU can offload compute intensive tasks. In this paper, we analyze(More)