Learn More
In this paper, we present a methodology for mapping an Embedded Signal Processing ESP application onto HPC platforms such that the throughput performance is maximized. Previous approaches used a linear pipelined execution model which restrict the mapping choices. We show that the optimal" solution obtained under that model can be improved, using the(More)
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all communication in High Performance Computing (HPC) systems. To minimize the communication latency, the algorithm reduces the total number of messages transmitted, reduces the variance of the lengths of these messages, and overlaps the communication with(More)
In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also(More)
Recently, High Performance Computing (HPC) platforms have been employed to realize many computationally demanding applications in signal and image processing. These applications require real-time performance constraints to be met. These constraints include latency as well as throughput. In order to meet these performance requirements, eecient parallel(More)
This paper demonstrates performance improvements for matrix multiplication and mesh generation for Finite Element Method (FEM) by optimizing the memory hierarchy of traditional processors. The theory developed earlier is used to perform such optimizations. Our work provides a uniform methodology across multiple HPC platforms for optimizing the performance(More)
  • 1