Learn More
Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous(More)
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multicore platform, since these specialized accelerators feature ISAs and functionality that are significantly different(More)
Recent extensions to the Intel ® Architecture feature the SIMD technique to enhance the performance of computational intensive applications that perform the same operation on different elements in a data set. To date, much of the code that exploits these extensions has been hand-coded. The task of the programmer is substantially simplified, however, if a(More)
Current processor trends of integrating more cores with wider SIMD units, along with a deeper and complex memory hierarchy, have made it increasingly more challenging to extract performance from applications. It is believed by some that traditional approaches to programming do not apply to these modern processors and hence radical new languages must be(More)
Summary form only given. Exploiting thread-level parallelism is a promising way to improve the performance of multimedia applications that are running on multithreading general-purpose processors. We describe the work in developing our threaded H.264 encoder. We parallelize the H.264 encoder using the OpenMP programming model, which allows us to leverage(More)
In the never-ending quest for higher performance, CPUs become faster and faster. Processor resources, however, are generally underutilized by many applications. Intel's Hyper-Threading Technology is developed to resolve this issue. This new technology allows a single processor to manage data as if it were two processors by executing data instructions from(More)
Multi-cores such as the Intel®1 Core™2 Duo processor, facilitate efficient thread-level parallel execution of ordinary programs, wherein the different threads-of-execution are mapped onto different physical processors. In this context, several techniques have been proposed for auto-parallelization of programs. Recently, thread-level speculation(More)
The ability of parallel computers to execute multiple instruction streams (tasks) simultaneously gives rise to the problem of partitioning a program into a set of tasks that can be assigned to different processors. The degree to which parallelism can be exploited, the amount of overhead involved during parallel execution of a program and a number of other(More)