Ahmad Khayyat

  • Citations Per Year
Learn More
This paper considers blocking and scheduling for the design and implementation of field-programmable gate array (FPGA)-based floating-point parallel matrix multiplication in the presence of a memory hierarchy. For high performance, on-chip memory holds data that are reused when the computation is divided into blocks, and multiple arithmetic units perform(More)
This paper describes performance optimizations of a transfer controller for an FPGA-based blocked parallel matrix multiplication accelerator. One of the key challenges of the controller is the generation of a sequence of host memory addresses to transfer blocks of matrices between host and on-chip memories. These addresses are not contiguous, thereby(More)
FPGA technology constitutes an attractive platform for high-performance accelerators of parallel workloads in general-purpose computers. Matrix multiplication is a computationally intensive application that is highly parallelizable. Previous work has typically described custom floating-point components and reported on specific designs or implementations(More)
The IEEE 802.15.4 standard is a low-power, low-rate MAC/PHY standard that meets most of the stringent requirements of single-hop wireless sensor networks. Sensor networks with nodal populations comprised of thousands of devices have been envisioned in conjunction with environmental, vehicular, and military applications, to mention a few. However, such large(More)
  • 1