• Publications
  • Influence
Base-delta-immediate compression: Practical data compression for on-chip caches
TLDR
A simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.Cache compression is a promising technique to increase on-chip cache capacity and to decrease on- chip and off-chip bandwidth usage. Expand
  • 258
  • 72
  • PDF
Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations toExpand
  • 196
  • 43
  • PDF
A scalable approach to thread-level speculation
TLDR
We propose and evaluate a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down). Expand
  • 394
  • 39
  • PDF
The potential for using thread-level data speculation to facilitate automatic parallelization
  • J. Steffan, T. Mowry
  • Computer Science
  • Proceedings Fourth International Symposium on…
  • 31 January 1998
TLDR
We explore the potential for using thread-level data speculation (TLDS) to overcome this limitation by allowing the compiler to view parallelization solely as a cost/benefit tradeoff rather than something which is likely to violate program correctness. Expand
  • 376
  • 35
  • PDF
Compiler-based prefetching for recursive data structures
TLDR
This paper investigates compiler-based prefetching for pointer-based applications---in particular, those containing recursive data structures. Expand
  • 414
  • 30
  • PDF
Design and evaluation of a compiler algorithm for prefetching
TLDR
This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices. Expand
  • 822
  • 28
  • PDF
Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes
As multiprocessors are scaled beyond single bus systems, there is renewed interest in directory-based cache coherence schemes. These schemes rely on a directory to keep track of all processorsExpand
  • 304
  • 27
  • PDF
RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization
TLDR
We propose RowClone, a new and simple mechanism to perform bulk copy and initialization completely within DRAM — eliminating the need to transfer any data over the memory channel to perform such operations. Expand
  • 231
  • 27
  • PDF
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
TLDR
In this paper, we propose a flexible hardware solution for accelerating a wide range of instruction-grain program monitoring tools. Expand
  • 136
  • 20
  • PDF
Linearly compressed pages: A low-complexity, low-latency main memory compression framework
TLDR
This paper proposes a new approach to main memory compression — Linearly Compressed Pages (LCP) — that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. Expand
  • 115
  • 20
  • PDF