Base-delta-immediate compression: Practical data compression for on-chip caches
- Gennady Pekhimenko, Vivek Seshadri, O. Mutlu, Phillip B. Gibbons, M. Kozuch, T. Mowry
- Computer ScienceInternational Conference on Parallel…
- 19 September 2012
There is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.
Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
- Vivek Seshadri, Donghyuk Lee, T. Mowry
- Computer ScienceMicro
- 14 October 2017
Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).
The potential for using thread-level data speculation to facilitate automatic parallelization
- J. G. Steffan, T. Mowry
- Computer ScienceProceedings Fourth International Symposium on…
- 31 January 1998
The potential for using thread-level data speculation (TLDS) to overcome this limitation by allowing the compiler to view parallelization solely as a cost/benefit tradeoff rather than something which is likely to violate program correctness is explored.
A scalable approach to thread-level speculation
- J. G. Steffan, C. Colohan, Antonia Zhai, T. Mowry
- Computer ScienceProceedings of 27th International Symposium on…
- 1 May 2000
This paper proposes and evaluates a design for supporting TLS that seamlessly scales to any machine size because it is a straightforward extension of writeback invalidation-based cache coherence (which itself scales both up and down).
Design and evaluation of a compiler algorithm for prefetching
- T. Mowry, M. Lam, Anoop Gupta
- Computer ScienceASPLOS V
- 1 September 1992
This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices, and shows that this algorithm significantly improves the execution speed of the benchmark programs-some of the programs improve by as much as a factor of two.
Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes
- Anoop Gupta, W. Weber, T. Mowry
- Computer ScienceInternational Conference on Parallel Processing
- 1990
As multiprocessors are scaled beyond single bus systems, there is renewed interest in directory-based cache coherence schemes that use a limited number of pointers per directory entry to keep track of all processors caching a memory block.
Compiler-based prefetching for recursive data structures
It is demonstrated that compiler-inserted prefetching can significantly improve the execution speed of pointer-based codes---as much as 45% for the applications the authors study and can improve performance by as much as twofold.
RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization
- Vivek Seshadri, Yoongu Kim, T. Mowry
- Computer ScienceMicro
- 7 December 2013
RowClone is proposed, a new and simple mechanism to perform bulk copy and initialization completely within DRAM — eliminating the need to transfer any data over the memory channel to perform such operations.
Linearly compressed pages: A low-complexity, low-latency main memory compression framework
- Gennady Pekhimenko, Vivek Seshadri, T. Mowry
- Computer ScienceMicro
- 7 December 2013
It is shown that any compression algorithm can be adapted to fit the requirements of LCP, and two previously-proposed compression algorithms to LCP are adapted: Frequent Pattern Compression and Base-Delta-Immediate Compression.
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
- Shimin Chen, M. Kozuch, Evangelos Vlachos
- Computer ScienceInternational Symposium on Computer Architecture
- 1 June 2008
This paper identifies three significant common sources of overheads and proposes three novel hardware techniques for addressing these overheads: Inheritance Tracking, Idempotent Filters, and Metadata-TLBs, which constitute a general-purpose hardware acceleration framework.
...
...