• Publications
  • Influence
Base-delta-immediate compression: Practical data compression for on-chip caches
TLDR
There is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.
Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
TLDR
Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).
A scalable approach to thread-level speculation
TLDR
This paper proposes and evaluates a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down).
The potential for using thread-level data speculation to facilitate automatic parallelization
TLDR
The potential for using thread-level data speculation (TLDS) to overcome this limitation by allowing the compiler to view parallelization solely as a cost/benefit tradeoff rather than something which is likely to violate program correctness is explored.
Compiler-based prefetching for recursive data structures
TLDR
It is demonstrated that compiler-inserted prefetching can significantly improve the execution speed of pointer-based codes---as much as 45% for the applications the authors study and can improve performance by as much as twofold.
Design and evaluation of a compiler algorithm for prefetching
TLDR
This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices, and shows that this algorithm significantly improves the execution speed of the benchmark programs-some of the programs improve by as much as a factor of two.
RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization
TLDR
RowClone is proposed, a new and simple mechanism to perform bulk copy and initialization completely within DRAM — eliminating the need to transfer any data over the memory channel to perform such operations.
Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes
TLDR
As multiprocessors are scaled beyond single bus systems, there is renewed interest in directory-based cache coherence schemes that use a limited number of pointers per directory entry to keep track of all processors caching a memory block.
Linearly compressed pages: A low-complexity, low-latency main memory compression framework
TLDR
It is shown that any compression algorithm can be adapted to fit the requirements of LCP, and two previously-proposed compression algorithms to LCP are adapted: Frequent Pattern Compression and Base-Delta-Immediate Compression.
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
TLDR
This paper identifies three significant common sources of overheads and proposes three novel hardware techniques for addressing these overheads: Inheritance Tracking, Idempotent Filters, and Metadata-TLBs, which constitute a general-purpose hardware acceleration framework.
...
1
2
3
4
5
...