Learn More
FPGAs are increasingly being used in the high performance and scientific computing community to implement floating-point based hardware accelerators. In this paper we analyze the floating-point multiplier and adder/subtractor units by considering the number of pipeline stages of the units as a parameter and use through-put/area as the metric. We achieve(More)
—We develop new algorithms and architectures for matrix multiplication on configurable devices. These have reduced energy dissipation and latency compared with the state-of-the-art field-programmable gate array (FPGA)-based designs. By profiling well-known designs, we identify " energy hot spots, " which are responsible for most of the energy dissipation.(More)
Advances in their technologies have positioned FPGAs and embedded processors to compete with digital signal processors (DSPs). In this paper, we evaluate the performance in terms of both latency and energy-efficiency of FP-GAs, embedded processors, and DSPs in multiplying two ¢ ¤ £ ¥ ¢ matrices. As specific examples, we have chosen a representative of each(More)
In this paper, we present techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform (FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in(More)
We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in [7] and [1] in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the
In this paper, we first develop a novel architecture for fixed-point LU decomposition of streaming input matrices, on FPGAs. Our architecture, based on a circular linear array , achieves the minimal latency and is resource-efficient. We then extend it, by using a stacked matrices approach, to a floating-point based architecture which achieves the minimal(More)
In this paper, new algorithms and architectures for matrix factorization are presented. Two fully-parallel and block-based designs for LU decomposition on configurable devices are proposed. A linear array architecture is employed to minimize the usage of long interconnects, leading to lower energy dissipation. The designs are made scalable by using a fixed(More)
Reconfigurable System-on-Chip (RSoC) devices are being used to implement many battery operated systems, where energy efficiency is a major concern. RSoCs incorporate many different components, such as processor core, reconfigurable logic, memory, etc. Various power management techniques can be applied to these components. Tasks within an application can be(More)
In this paper, we develop energy efficient designs for the Fast Fourier Transform (FFT) on FPGAs. Architectures for FFT on FPGAs are designed by investigating and applying techniques for minimizing the energy dissipation. Architectural parameters such as degrees of vertical and horizontal parallelism are identified and a design domain is created through a(More)