OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures

@article{Zhan2016OSCAROS,
  title={OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures},
  author={Jia Zhan and Onur Kayiran and Gabriel H. Loh and Chita R. Das and Yuan Xie},
  journal={2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)},
  year={2016},
  pages={1-13}
}
As we integrate data-parallel GPUs with general-purpose CPUs on a single chip, the enormous cache traffic generated by GPUs will not only exhaust the limited cache capacity, but also severely interfere with CPU requests. Such heterogeneous multicores pose significant challenges to the design of shared last-level cache (LLC). This problem can be mitigated by replacing SRAM LLC with emerging non-volatile memories like Spin-Transfer Torque RAM (STT-RAM), which provides larger cache capacity and… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • Simulation results on a 28-GPU and 14-CPU system demonstrate an average of 17.4% performance improvement for CPUs, 10.8% performance improvement for GPUs, and 28.9% LLC energy saving compared to SRAM based LLC design.
  • We perform an extensive evaluation of our proposal and show that our techniques provide 17.4% and 10.8% average performance improvement for CPU and GPU applications, respectively, and 28.9% LLC energy saving, compared to the conventional SRAM based LLC design.

Citations

Publications citing this paper.
SHOWING 1-10 OF 17 CITATIONS

Opportunistic computing in GPU architectures

VIEW 4 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

Heterogeneity Aware Shared DRAM Cache for Integrated Heterogeneous Architectures

VIEW 4 EXCERPTS
CITES RESULTS, BACKGROUND & METHODS
HIGHLY INFLUENCED

Scaling Datacenter Accelerators with Compute-Reuse Architectures

  • 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)
  • 2018
VIEW 7 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads

  • 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 48 REFERENCES

Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM

  • 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
  • 2013
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support

  • 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
  • 2010
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

A novel architecture of the 3D stacked MRAM L2 cache for CMPs

  • 2009 IEEE 15th International Symposium on High Performance Computer Architecture
  • 2009
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

7.5 A 3.3ns-access-time 71.2μW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture

  • 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers
  • 2015
VIEW 1 EXCERPT

Bandwidth-efficient on-chip interconnect designs for GPGPUs

  • 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC)
  • 2015
VIEW 1 EXCERPT

DimNoC: A dim silicon approach towards power-efficient on-chip network

  • 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC)
  • 2015
VIEW 1 EXCERPT

A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior

  • 2014 IEEE International Symposium on Workload Characterization (IISWC)
  • 2014
VIEW 1 EXCERPT

Similar Papers

Loading similar papers…