Learn More
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programma-bility, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to(More)
The scan primitives are powerful, general-purpose data-parallel primitives that are building blocks for a broad range of applications. We describe GPU implementations of these primitives, specifically an efficient formulation and implementation of <i>segmented scan</i>, on NVIDIA GPUs using the CUDA API. Using the scan primitives, we show novel GPU(More)
<italic>The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the &#8220;3-D&#8221; structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and(More)
Processor architectures with tens to hundreds of arithmetic units are emerging to handle media processing applications. These applications, such as image coding, image synthesis, and image understanding, require arithmetic rates of up to 10 11 operations per second. As the number of arithmetic units in a processor increases to meet these demands, register(More)
Media applications, such as image processing, signal processing , video, and graphics, require high computation rates and data bandwidths. The stream programming model is a natural and powerful way to describe these applications. Expressing media applications in this model allows hardware and software systems to take advantage of their concurrency and(More)
Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are poorly matched to conventional microprocessor archi-tectures, they are a good fit for modern VLSI technology with its high arithmetic capacity but limited global band-width. The(More)