Ahmed El-Mahdy

Learn More
Modern processors incorporate SIMD instructions to improve the performance of multimedia applications. Vectorizing compilers are therefore sought to efficiently generate SIMD instructions. With the existence of different families of SIMD instruction sets, the task of compiler writers is more complex. Moreover virtual machines, such as JVMs, are currently(More)
3D transpose is an important operation in many large scale scientific applications such as seismic and medical imaging. This paper proposes a novel algorithm for fast in-place 3D transpose operation. The algorithm exploits <i>Single Instruction Multiple Data</i> (SIMD) multicore architecture with software managed memory hierarchy. Such architectural(More)
The JAMAICA project is concerned with the design of a single-chip multi-processor aimed specifically at multi-threaded Java implementation. It has wide ranging aims that require research in a variety of hardware and software areas. The project is still in its early stages and most of the work is still to do. This paper provides an overview of the project as(More)
The significance of load-balancing is rising with the increasing number of processing cores per chip. A fast load-balancer is sought to exploit fine grain parallelism possible with multicore processors. This paper focuses on load-balancing image processing applications where the amount of processing varies per pixel; such application domain includes high(More)
— Finding an optimal solution of signal traffic control durations is a computationally intensive task. It is typically O(T 3) in time, and O(T 2) in space, where T is the length of the control interval in discrete time steps. In this paper, we propose a linear time and space algorithm for the same problem. The algorithm provides for an efficient dynamic(More)