Brent Bohnenstiehl

Learn More
1000 programmable processors and 12 independent memory modules capable of simultaneously servicing both data and instruction requests are integrated onto a 32nm PD-SOI CMOS device. At 1.1 V, processors operate up to an average of 1.78 GHz yielding a maximum total chip computation rate of 1.78 trillion instructions/sec. At 0.84 V, 1000 cores execute 1(More)
Due to high levels integration, the design of many-core systems becomes increasingly challenging. Runtime dynamic voltage and frequency scaling (DVFS) is an effective method in managing the power based on performance requirements in the presence of workload variations. This paper presents an on-line scalable hardware-based dynamic voltage frequency(More)
We study the problem of mapping concurrent tasks of an application modeled as a data flow graph onto processors of a GALS-based manycore platform. We propose a mapping algorithm called BAMSE, which exploits the characteristics of streaming applications and the specifications of the target architecture to optimize the mapping solution. Different(More)
As processors move from multi-core to many-core architectures, opportunities arise for energy-efficient enterprise computations, such as sorting, on large arrays of processors. This paper proposes three different energy-efficient sorting methods for the first phase of an external sort simulated on a varying sized fine-grained many-core processor arrays used(More)
......Parallel processing offers wellknown benefits in performance and efficiency, with many modern chip designs focusing on integrating increasing numbers of processors on a single die instead of increasing the complexity of a smaller number of processors. Many current and future computing applications, ranging from embedded Internet-of-Things devices to(More)
The widths of data words in digital processors have a direct impact on area in application-specific ICs (ASICs) and FPGAs. Circuit area impacts energy dissipation per workload and chip cost. Floatingpoint exponent and mantissa widths are independently varied for the seven major computational blocks of an airborne synthetic aperture radar (SAR) engine. The(More)
This paper presents the design and implementation of a software Low Density Parity Check (LDPC) decoder on the AsAP2 platform, which contains a 2D mesh of 164 programmable processors designed for general DSP applications. A software decoding algorithm is described which requires low memory overhead, and scalable methods are provided for parallelizing the(More)
  • 1