XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

@article{Li2020XSPAP,
  title={XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs},
  author={C. Li and Abdul Dakkak and Jinjun Xiong and W. Wei and Lingjie Xu and W. Hwu},
  journal={2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
  year={2020},
  pages={326-327}
}
  • C. Li, Abdul Dakkak, +3 authors W. Hwu
  • Published 2020
  • Computer Science, Mathematics
  • 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as… Expand
11 Citations
MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale
  • 1
  • PDF
Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design
  • PDF
Benchmarking Deep Learning for Time Series: Challenges and Directions
  • 4
  • PDF
Challenges for Building a Cloud Native Scalable and Trustable Multi-tenant AIoT Platform
  • Jinjun Xiong, Huamin Chen
  • Computer Science
  • 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
  • 2020
  • 1
RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads
  • Highly Influenced
  • PDF
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
  • Cheng Li, Abdul Dakkak, Jinjun Xiong, W. Hwu
  • Computer Science, Mathematics
  • 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • 2020
  • 5
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 47 REFERENCES
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
  • 60
  • PDF
Fathom: reference workloads for modern deep learning methods
  • 113
  • PDF
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
  • 53
  • PDF
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
  • 412
  • PDF
Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures
  • 3
  • PDF
Caffe: Convolutional Architecture for Fast Feature Embedding
  • 12,742
  • PDF
Frustrated with Replicating Claims of a Shared Model? A Solution
  • 6
Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir
  • 228
  • PDF
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance
  • 46
  • PDF
...
1
2
3
4
5
...