Ravishankar Iyer

Learn More
As manycore architectures enable a large number of cores on the die, a key challenge that emerges is the availability of memory bandwidth with conventional DRAM solutions. To address this challenge, integration of large DRAM caches that provide as much as 5× higher bandwidth and as low as 1/3rd of the latency (as compared to conventional DRAM) is(More)
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL, offer a cost-effective platform for many applications by providing high thread level parallelism at lower energy budgets. Unfortunately, for many general-purpose applications, available hardware resources of a GPGPU are not efficiently utilized, leading to lost opportunity in(More)
High density, low leakage and non-volatility are the attractive features of Spin-Transfer-Torque-RAM (STT-RAM), which has made it a strong competitor against SRAM as a universal memory replacement in multi-core systems. However, STT-RAM suffers from high write latency and energy which has impeded its widespread adoption. To this end, we look at trading-off(More)
The trends in enterprise IT toward service-oriented com- puting, server consolidation, and virtual computing point to a future in which workloads are becoming increasingly di- verse in terms of performance, reliability, and availability requirements. It can be expected that more and more appli- cations with diverse requirements will run on a CMP and share(More)
In this paper, we present techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better tolerate long memory latencies. We demonstrate that existing warp scheduling policies in GPGPU architectures are unable to effectively incorporate data prefetching. The main reason(More)
Specialized hardware accelerators are enabling today's System on Chip (SoC) platforms to target various applications. In this paper we show that as these SoCs evolve in complexity and usage, the programming models for such platforms need to evolve beyond the traditional driver oriented architecture. Using a test set up that employs a programmable FPGA based(More)
Almost all hardware platforms to date have been homogeneous with one or more identical processors managed by the operating system (OS). However, recently, it has been recognized that power constraints and the need for domain-specific high performance computing may lead architects towards building heterogeneous architectures and platforms in the near future.(More)
Cache hierarchies in future many-core processors are expected to grow in size and contribute a large fraction of overall processor power and performance. In this paper, we postulate a 3D chip design that stacks SRAM and DRAM upon processing cores and employs OS-based page coloring to minimize horizontal communication of cache data. We then propose a(More)
In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not(More)
......Today’s multicore processors already integrate multiple cores on a die. Many-core architectures enable far more small cores for throughput computing. The key challenge in many-core architectures is the memory bandwidth wall: the required memory bandwidth to keep all cores running smoothly is a significant challenge. In the past, researchers have(More)