GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications

  title={GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications},
  author={Bo Fang and Karthik Pattabiraman and Matei Ripeanu and Sudhanva Gurumurthi},
  journal={2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)},
While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This paper makes… CONTINUE READING
Highly Cited
This paper has 78 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 46 extracted citations

Computer Safety, Reliability, and Security

Lecture Notes in Computer Science • 2016
View 14 Excerpts
Highly Influenced

Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) • 2017
View 6 Excerpts
Highly Influenced

A Survey of Techniques for Modeling and Improving Reliability of Computing Systems

IEEE Transactions on Parallel and Distributed Systems • 2016
View 5 Excerpts
Highly Influenced

Evaluating the impact of execution parameters on program vulnerability in GPU applications

2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) • 2018
View 2 Excerpts

Hamartia: A Fast and Accurate Error Injection Framework

2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) • 2018
View 1 Excerpt

79 Citations

Citations per Year
Semantic Scholar estimates that this publication has 79 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 25 references

Parallel prefix sum (scan) with CUDA

M. Harris, S. Sengupta, J. D. Owens
GPU Gems 3, H. Nguyen, Ed. Addison Wesley, August 2007, ch. 39, pp. 851–876. • 2007
View 8 Excerpts
Highly Influenced

BLOCKWATCH: Leveraging similarity in parallel programs for error detection

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) • 2012
View 1 Excerpt

Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool

2012 International Conference for High Performance Computing, Networking, Storage and Analysis • 2012
View 1 Excerpt

Statistical fault injection-based avf analysis of a gpu architecure

R.U.N. Farazman, D. Kaeli
IEEE Workshop on Silicon Errors in Logic, 2012. • 2012
View 1 Excerpt

Analyzing soft-error vulnerability on GPGPU microarchitecture

2011 IEEE International Symposium on Workload Characterization (IISWC) • 2011
View 1 Excerpt

Component failure analysis using Neutron beam test

2010 17th IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits • 2010
View 1 Excerpt

Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing • 2010

Analyzing CUDA workloads using a detailed GPU simulator

2009 IEEE International Symposium on Performance Analysis of Systems and Software • 2009
View 1 Excerpt

Similar Papers

Loading similar papers…