A Proposed API for Batched Basic Linear Algebra Subprograms

@inproceedings{Dongarra2016APA,
  title={A Proposed API for Batched Basic Linear Algebra Subprograms},
  author={Jack J. Dongarra and Iain S. Duff and Mark Gates and Azzam Haidar and Sven Hammarling and Nicholas J. Higham and Jonathan D. Hogg and Pedro Valero-Lara and Samuel D. Relton and Stanimire Tomov and Mawussi Zounon},
  year={2016}
}
This paper proposes an API for Batched Basic Linear Algebra Subprograms (Batched BLAS). We focus on many independent BLAS operations on small matrices that are grouped together as a single routine, called Batched BLAS routine, with the aim of providing more efficient, but portable, implementations of algorithms on high-performance manycore architectures (like multi/manycore CPU processors, GPUs, and coprocessors). 

Similar Papers

Citations

Publications citing this paper.
SHOWING 1-10 OF 13 CITATIONS

References

Publications referenced by this paper.
SHOWING 1-10 OF 27 REFERENCES

Batched Matrix-Matrix Multiplication Operations for Intel

Murat Guney, Sarah Knepper, +3 authors Shane Story
  • Xeon Processor and Intel Xeon Phi Co-Processor,
  • 2015
VIEW 2 EXCERPTS
HIGHLY INFLUENTIAL

Batched Matrix-Matrix Multiplication Operations for Intel

Murat Guney, Sarah Knepper, +3 authors Shane Story
  • Xeon Processor and Intel Xeon Phi Co-Processor,
  • 2015
VIEW 2 EXCERPTS
HIGHLY INFLUENTIAL

LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU

  • 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)
  • 2014
VIEW 2 EXCERPTS
HIGHLY INFLUENTIAL

A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU

  • 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • 2014
VIEW 1 EXCERPT