Ahmed El-Mahdy

Learn More
Modern processors incorporate SIMD instructions to improve the performance of multimedia applications. Vectorizing compilers are therefore sought to efficiently generate SIMD instructions. With the existence of different families of SIMD instruction sets, the task of compiler writers is more complex. Moreover virtual machines, such as JVMs, are currently(More)
3D transpose is an important operation in many large scale scientific applications such as seismic and medical imaging. This paper proposes a novel algorithm for fast in-place 3D transpose operation. The algorithm exploits <i>Single Instruction Multiple Data</i> (SIMD) multicore architecture with software managed memory hierarchy. Such architectural(More)
The significance of load-balancing is rising with the increasing number of processing cores per chip. A fast load-balancer is sought to exploit fine grain parallelism possible with multicore processors. This paper focuses on load-balancing image processing applications where the amount of processing varies per pixel; such application domain includes high(More)
— Finding an optimal solution of signal traffic control durations is a computationally intensive task. It is typically O(T 3) in time, and O(T 2) in space, where T is the length of the control interval in discrete time steps. In this paper, we propose a linear time and space algorithm for the same problem. The algorithm provides for an efficient dynamic(More)
Control-flow dependence is an intrinsic limiting factor for program acceleration. With the availability of instruction-level parallel architectures, if-conversion optimization has, therefore, become pivotal for extracting parallelism from serial programs. While many if-conversion optimization heuristics have been proposed in the literature, most of them(More)