• Corpus ID: 67770718

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study

@article{Ghose2019UnderstandingTI,
  title={Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study},
  author={Saugata Ghose and Tianshi Li and Nastaran Hajinazar and Damla Senol Cali and Onur Mutlu},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.07609}
}
It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of DRAM chips. [] Key Result Notably, we find that (1) newer DRAM types such as DDR4 and HMC often do not outperform older types such as DDR3, due to higher access latencies and, in the case of HMC, poor exploitation of locality; (2) there is no single DRAM type that can cater to all components of a heterogeneous system (e.g., GDDR5 significantly outperforms other memories for…
CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability
TLDR
This work proposes Copy-Row DRAM (CROW), a flexible substrate that enables new mechanisms for improving DRAM performance, energy efficiency, and reliability and uses the CROW substrate to implement a low-cost in-DRAM caching mechanism that lowers DRAM activation latency to frequently-accessed rows by 38% and a mechanism that avoids the use of short-retention-time rows to mitigate the performance and energy overhead of DRAM refresh operations.
Towards Application-Specific Address Mapping for Emerging Memory Devices
TLDR
This work calculates window-based probabilistic entropy for groups of address bits to determine a near-optimal address mapping and presents simulation results for ten applications that show a performance improvement up to 25% over fixed address-mapping and up to 8% over previous application-specific address mapping for this proposed approach.
CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators
TLDR
CoNDA is proposed, a coherence mechanism that lets an NDA optimistically execute an Nda kernel, under the assumption that the NDA has all necessary coherence permissions, and allows CoNDA to gather information on the memory accesses performed by the Nda and by the rest of the system.
Automatic Sublining for Efficient Sparse Memory Accesses
TLDR
The Instruction Spatial Locality Estimator (ISLE), a hardware detector that finds instructions that access isolated words in a sea of unused data that is dynamically converted into uncached subline accesses, while keeping regular accesses cached, is proposed.
Demystifying memory access patterns of FPGA-based graph processing accelerators
TLDR
This work builds on a simulation environment for graph processing accelerators, to make several existing accelerator approaches comparable and yields insights into the strengths and weaknesses of current graph processing Accelerators along these dimensions, and features a novel in-depth comparison.
Non-Relational Databases on FPGAs: Survey, Design Decisions, Challenges
TLDR
This survey describes and categorizes the inherent differences and non-trivial trade-offs of relevant NRDS classes as well as their commonalities in the context of common design decisions when building such a system with FPGAs, and outlines the future of FPGA-accelerated NRDS.
Walter: Wide I/O Scaling of Number of Memory Controllers Versus Frequency and Voltage
TLDR
The findings show that the Wide I/O architectural benefits of using a larger number of MCs coupled with wider ranks when combined to VFS are promising, and the architectural replacement of ranks set at specification frequencies with ones set at lower frequencies allows temperature reduction thus likely allowing further rank stacking.

References

SHOWING 1-10 OF 177 REFERENCES
A performance comparison of contemporary DRAM architectures
TLDR
A simulation-based performance study of a representative group of small-system organizations, each evaluated in a small system organization, reveals that current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem.
A performance comparison of DRAM memory system optimizations for SMT processors
  • Zhichun Zhu, Zhao Zhang
  • Computer Science
    11th International Symposium on High-Performance Computer Architecture
  • 2005
TLDR
The use of SMT techniques has somewhat changed the context of DRAM optimizations but does not make them obsolete, and thread-aware DRAM access scheduling schemes may improve performance by up to 30% on workload mixes of memory-intensive applications.
Understanding and Improving the Latency of DRAM-Based Memory Systems
TLDR
The key conclusion of this dissertation is that augmenting DRAM architecture with simple and low-cost features, and developing a better understanding of manufactured DRAM chips together lead to significant memory latency reduction as well as energy efficiency improvement.
Evaluation of emerging memory technologies for HPC, data intensive applications
TLDR
The impact of emerging technologies on HPC and data-intensive workloads modeling a 5-level hybrid memory hierarchy design is evaluated and a combination of the two approaches, which essentially replaces the traditional DRAM with a small EDRAM or HMC cache between the last level cache and the non-volatile memory, can grant capacity and improved performance and energy efficiency.
ChargeCache: Reducing DRAM latency by exploiting row access locality
TLDR
This work develops a low-cost mechanism, called ChargeCache, that enables faster access to recently- accessed rows in DRAM, with no modifications to DRAM chips, based on the key observation that a recently-accessed row has more charge and thus the following access to the same row can be performed faster.
CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability
TLDR
This work proposes Copy-Row DRAM (CROW), a flexible substrate that enables new mechanisms for improving DRAM performance, energy efficiency, and reliability and uses the CROW substrate to implement a low-cost in-DRAM caching mechanism that lowers DRAM activation latency to frequently-accessed rows by 38% and a mechanism that avoids the use of short-retention-time rows to mitigate the performance and energy overhead of DRAM refresh operations.
Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization
TLDR
Flexible-LatencY DRAM is proposed, a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance and exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations.
Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity
TLDR
This dissertation provides a detailed analysis of DRAM latency by using both circuit-levelsimulation with a detailed DRAM model and FPGA-based pro?ling of real DRAM modules, and proposes anew technique, Architectural-Variation-Aware DRAM (AVA-DRAM), which reduces DRAMlatency at low cost.
Power and Performance Trade-Offs in Contemporary DRAM System Designs for Multicore Processors
TLDR
The results show clearly that DRAM system configurations, including page policy, power mode, device configuration, burst length, channel organization, and the selection of DRAM technology, affects the memory power consumption significantly besides the performance.
DRAM errors in the wild: a large-scale field study
TLDR
Measurements of memory errors in a large fleet of commodity servers over a period of 2.5 years provide strong evidence that memory errors are dominated by hard errors, rather than soft errors, which previous work suspects to be the dominant error mode.
...
...