SpZip: Architectural Support for Effective Data Compression In Irregular Applications
@article{Yang2021SpZipAS, title={SpZip: Architectural Support for Effective Data Compression In Irregular Applications}, author={Yifan Yang and Joel S. Emer and Daniel S{\'a}nchez}, journal={2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)}, year={2021}, pages={1069-1082} }
Irregular applications, such as graph analytics and sparse linear algebra, exhibit frequent indirect, data-dependent accesses to single or short sequences of elements that cause high main memory traffic and limit performance. Data compression is a promising way to accelerate irregular applications by reducing memory traffic. However, software compression adds substantial overheads, and prior hardware compression techniques work poorly on the complex access patterns of irregular applications.We…
Figures and Tables from this paper
References
SHOWING 1-10 OF 84 REFERENCES
Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design
- Computer Science2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
- 2021
Prodigy is presented, a low-cost hardware-software codesign solution for intelligent prefetching to improve the memory latency of several important irregular workloads and compares the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers.
Linearly compressed pages: A low-complexity, low-latency main memory compression framework
- Computer Science2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2013
It is shown that any compression algorithm can be adapted to fit the requirements of LCP, and two previously-proposed compression algorithms to LCP are adapted: Frequent Pattern Compression and Base-Delta-Immediate Compression.
Optimizing indirect memory references with milk
- Computer Science2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)
- 2016
Modern applications such as graph and data analytics, when operating on real world data, have working sets much larger than cache capacity and are bottlenecked by DRAM. To make matters worse, DRAM…
Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures
- Computer Science2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
- 2016
This paper presents a novel and lightweight compression algorithm, Bit-Plane Compression (BPC), to increase the effective memory bandwidth and reduces memory bandwidth requirements significantly.
An Event-Triggered Programmable Prefetcher for Irregular Workloads
- Computer ScienceASPLOS 2018
- 2018
An event-triggered programmable prefetcher combining the flexibility of a general-purpose computational unit with an event-based programming model, along with compiler techniques to automatically generate events from the original source code with annotations is proposed.
Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism
- Computer Science2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2020
Pipette is presented, a technique that enables cheap pipeline parallelism within each core using architecturally visible queues and avoids load imbalance and achieves high core IPC by time-multiplexing stages on the same core.
IMP: Indirect memory prefetcher
- Computer Science2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2015
This work proposes an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency, and proposes a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality.
SC2: A statistical compression cache scheme
- Computer Science2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
- 2014
This paper presents, for the first time, a detailed design-space exploration of caches that utilize statistical compression and shows that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory.
When is Graph Reordering an Optimization? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs
- Computer Science2018 IEEE International Symposium on Workload Characterization (IISWC)
- 2018
This work identifies lightweight re ordering techniques that improve performance even after accounting for the overhead of reordering, and addresses a major impediment to the general adoption of these reordering techniques - input-dependent speedups – by linking the speedup from lightweight reordering to structural properties of the input graph.
QEI: Query Acceleration Can be Generic and Efficient in the Cloud
- Computer Science2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
- 2021
This paper proposes QEI, a generic, integrated, and efficient acceleration solution for various data structure queries that allows multiple query operations to execute in parallel to maximize throughput and proposes a novel way to integrate the accelerator into the CPU that balances performance, latency, and hardware cost.