Strassen's Matrix Multiplication on GPUs

  title={Strassen's Matrix Multiplication on GPUs},
  author={Junjie Li and Sanjay Ranka and Sartaj Sahni},
  journal={2011 IEEE 17th International Conference on Parallel and Distributed Systems},
We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm… CONTINUE READING
Highly Cited
This paper has 30 citations. REVIEW CITATIONS
20 Citations
16 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 20 extracted citations


Publications referenced by this paper.
Showing 1-10 of 16 references

GPU matrix Multiplication, chapter in Handbook on Multicore Computing

  • J. Li, S. Ranka, S. Sahni
  • 2011
Highly Influential
7 Excerpts

Toward an optimal algorithm for matrix multiplication

  • S. Robinson
  • SIAM News,
  • 2005

Data Structures, Algorithms, And Applications In C++

  • S. Sahni
  • 2004
1 Excerpt

Similar Papers

Loading similar papers…