Nagendra Dwarakanath Gulur

Learn More
The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One(More)
Stacked DRAM promises to offer unprecedented capacity, and bandwidth to multi-core processors at moderately lower latency than off-chip DRAMs. A typical use of this abundant DRAM is as a large last level cache. Prior research works are divided on how to organize this cache and the proposed organizations fall into one of two categories: (i) as a(More)
Memory system design is increasingly influencing modern multi-core architectures from both performance and power perspectives. However predicting the performance of memory systems is complex, compounded by the myriad design choices and parameters along multiple dimensions, namely (i) technology, (ii) design and (iii) architectural choices. In this work, we(More)
DRAM memory systems require periodic recharging to avoid loss of data from leaky capacitors. These refresh operations consume energy and reduce the duration of time for which the DRAM banks are available to service memory requests. Higher DRAM density and 3D-stacking aggravate the refresh overheads, incurring even higher energy and performance costs.(More)
In this work, we study the performance benefits of using asynchronous data transfers in OpenCL programs executing on media processors. Asynchronous data transfers are typically implemented by use of Direct Memory Access (DMA) engines that can be programmed to transfer data from one memory location to another. Asynchronous transfers can free up processing(More)
In this paper, we present Bi-Modal Cache - a flexible stacked DRAM cache organization which simultaneously achieves several objectives: (i) improved cache hit ratio, (ii) moving the tag storage overhead to DRAM, (iii) lower cache hit latency than tags-in-SRAM, and (iv) reduction in off-chip bandwidth wastage. The Bi-Modal Cache addresses the miss rate(More)
In this paper, based on the temporal and spatial locality characteristics of memory accesses in multicores, we propose a re-organization of the existing single large row buffer in a DRAM bank into multiple smaller row-buffers. The proposed configuration helps improve the row hit rates and also brings down the energy required for row-activations. The major(More)
  • 1