Learn More
In this paper, we present a methodology for mapping an Embedded Signal Processing (ESP) application onto HPC platforms such that the throughput performance is maximized. Previous approaches used a linear pipelined execution model which restrict the mapping choices. We show that the \optimal" solution obtained under that model can be improved, using the(More)
In this paper, we develop portable and scalable algorithms for performing irregular all-to-all communication in High Performance Computing (HPC) systems. To minimize the communication latency, the algorithm reduces the total number of messages transmitted, reduces the variance of the lengths of these messages, and overlaps the communication with(More)
In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also(More)
The recent accelerated development of scalable computing systems has made possible the coordinated use of a suite of High Performance Computing (HPC) components for computationally demanding problems in embedded applications. These emerging Scalable Heterogeneous High Performance Embedded (SHHiPE) systems are designed using commercial-oo-the-shelf (COTS)(More)
Recently, High Performance Computing (HPC) platforms have been employed to realize many computationally demanding applications in signal and image processing. These applications require real-time performance constraints to be met. These constraints include latency as well as throughput. In order to meet these performance requirements, efficient parallel(More)
This paper demonstrates performance improvements for matrixmultiplication and mesh generation for Finite Element Method (FEM) by optimizing the memory hierarchy of traditional processors. The theory developed earlier is used to perform such optimizations. Our work provides a uniform methodology across multiple HPC platforms for optimizing the performance of(More)