Learn More
Bitwise operations are an important component of modern day programming, and are used in a variety of applications such as databases. In this work, we propose a new and simple mechanism to implement bulk bitwise AND and OR operations in DRAM, which is faster and more efficient than existing mechanisms. Our mechanism exploits existing DRAM operation to(More)
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance(More)
Pointer chasing is a fundamental operation, used by many important data-intensive applications (e.g., databases, key-value stores, graph processing workloads) to traverse linked data structures. This operation is both memory bound and latency sensitive, as it (1) exhibits irregular access patterns that cause frequent cache and TLB misses, and (2) requires(More)
<italic>Processing-in-memory</italic> (PIM) architectures cannot use traditional approaches to cache coherence due to the high off-chip traffic consumed by coherence messages. We propose <italic>LazyPIM</italic>, a new hardware cache coherence mechanism designed specifically for PIM. LazyPIM uses a combination of speculative cache coherence and compressed(More)
Through the last decade, we have witnessed a surge of Internet of Things (IoT) devices, and with that a greater need to choreograph their actions across both time and space. Although these two problems, namely time synchronization and localization, share many aspects in common, they are traditionally treated separately or combined on centralized approaches(More)
OBJECT Optimal treatment of primary and recurrent craniopharyngiomas remains controversial. Radical resection and limited resection plus radiation therapy yield similar rates of disease control and overall survival. The data are much less clear for recurrent tumors. The authors report their experience with radical resection of both primary and recurrent(More)
Long DRAM latency is a critical performance bottleneck in current systems. DRAM access latency is defined by three fundamental operations that take place within the DRAM cell array: (i) activation of a memory row, which opens the row to perform accesses; (ii) precharge, which prepares the cell array for the next memory access; and (iii) restoration of the(More)
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited o -chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to signi cantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance(More)
This paper introduces a new resource virtualization framework, Zorua, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer. The virtualization provided by Zorua builds on two key(More)
Since 2014, I am a Ph.D. student in the department of Electrical and Computer Engineering at Carnegie Mellon University, advised by Professor Phillip B. Gibbons and Professor Onur Mutlu. I am interested in research problems that lie in the intersection of machine learning, distributed systems, and computer architecture. My current research focus is on(More)