Strassen's Matrix Multiplication on GPUs

@article{Li2011StrassensMM,
  title={Strassen's Matrix Multiplication on GPUs},
  author={Junjie Li and Sanjay Ranka and Sartaj Sahni},
  journal={2011 IEEE 17th International Conference on Parallel and Distributed Systems},
  year={2011},
  pages={157-164}
}
We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm… CONTINUE READING
Highly Cited
This paper has 30 citations. REVIEW CITATIONS
20 Citations
16 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-10 of 20 extracted citations

References

Publications referenced by this paper.
Showing 1-10 of 16 references

GPU matrix Multiplication, chapter in Handbook on Multicore Computing

  • J. Li, S. Ranka, S. Sahni
  • 2011
Highly Influential
7 Excerpts

Toward an optimal algorithm for matrix multiplication

  • S. Robinson
  • SIAM News,
  • 2005

Data Structures, Algorithms, And Applications In C++

  • S. Sahni
  • 2004
1 Excerpt

Similar Papers

Loading similar papers…