Analyzing CUDA workloads using a detailed GPU simulator

@article{Bakhoda2009AnalyzingCW,
  title={Analyzing CUDA workloads using a detailed GPU simulator},
  author={A. Bakhoda and George L. Yuan and Wilson W. L. Fung and H. Wong and Tor M. Aamodt},
  journal={2009 IEEE International Symposium on Performance Analysis of Systems and Software},
  year={2009},
  pages={163-174}
}
  • A. Bakhoda, George L. Yuan, +2 authors Tor M. Aamodt
  • Published 2009
  • Computer Science
  • 2009 IEEE International Symposium on Performance Analysis of Systems and Software
  • Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow's manycore processors, whether those are GPUs or otherwise. [...] Key Result Two observations we make are (1) that for the applications we study, performance is more sensitive to interconnect bisection bandwidth rather than latency, and (2) that, for some applications, running fewer threads concurrently than on-chip resources might otherwise allow…Expand Abstract
    1,327 Citations
    Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
    • 86
    • PDF
    A Quantitative Evaluation of Contemporary GPU Simulation Methodology
    A Quantitative Evaluation of Contemporary GPU Simulation Methodology
    • 1
    • Highly Influenced
    A Quantitative Evaluation of Contemporary GPU Simulation Methodology
    • 7
    Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+
    • 9
    • Highly Influenced
    MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization
    • Y. Sun, Trinayan Baruah, +15 authors D. Kaeli
    • Computer Science
    • 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA)
    • 2019
    • 16
    Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

    References

    SHOWING 1-10 OF 56 REFERENCES
    Tradeoffs in designing accelerator architectures for visual computing
    • 48
    • PDF
    Program optimization space pruning for a multithreaded gpu
    • 296
    • Highly Influential
    • PDF
    Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
    • 457
    • PDF
    Accelerating Large Graph Algorithms on the GPU Using CUDA
    • 732
    • PDF
    A flexible simulation framework for graphics architectures
    • 79
    • PDF
    The microarchitecture of the Pentium 4 processor
    • 679
    • PDF
    ATTILA: a cycle-level execution-driven simulator for modern GPU architectures
    • 97
    • PDF
    Evaluating the Imagine stream architecture
    • 152
    • PDF