• Corpus ID: 30747376

Performance Optimization for the K-Nearest Neighbors Kernel using Strassen ’ s Algorithm

@inproceedings{Rice2017PerformanceOF,
  title={Performance Optimization for the K-Nearest Neighbors Kernel using Strassen ’ s Algorithm},
  author={Leslie Rice},
  year={2017}
}
Strassen’s algorithm is an algorithm for computing matrix-matrix multiplication using only 7 multiplications rather than the usual 8. Recent advances have shown the benefit of using Strassen’s algorithm to improve the performance of general matrix-matrix multiplication (GEMM) for matrices of varying shapes and sizes. These advances have created an opportunity to incorporate Strassen’s algorithm in other matrixmatrix multiplication like operations. In this paper, we do so for the GSKNN (General… 

Figures from this paper

References

SHOWING 1-9 OF 9 REFERENCES
Performance optimization for the k-nearest neighbors kernel on x86 architectures
TLDR
This work proposes an efficient implementation and performance analysis for the kNN kernel on x86 architectures, and presents an analysis of the algorithm and explains parameter selection, and observes significant speedups when searching for 16 neighbors in a point dataset with 1.6 million points in 64 dimensions.
Strassen's Algorithm Reloaded
TLDR
The practical implementation of Strassen's algorithm for matrix-matrix multiplication (DGEMM) requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations and can be plug-compatible with the standard DG EMM interface.
Generating Families of Practical Fast Matrix Multiplication Algorithms
TLDR
This study shows that Strassen-like fast matrix multiplication can be incorporated into libraries for practical use and demonstrates a performance benefit over conventional GEMM on single core and multi-core systems.
A framework for practical parallel fast matrix multiplication
  • Austin R. Benson, Grey Ballard
  • Computer Science
    Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • 2015
TLDR
It is shown that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and Strassen's fast algorithm on modest problem sizes and shapes and that the best choice of fast algorithm depends not only on the size of the matrices but also the shape.
Gaussian elimination is not optimal
t. Below we will give an algorithm which computes the coefficients of the product of two square matrices A and B of order n from the coefficients of A and B with tess than 4 . 7 n l°g7 arithmetical
BLIS: A Framework for Rapidly Instantiating BLAS Functionality
TLDR
Preliminary performance of level-2 and level-3 operations is observed to be competitive with two mature open source libraries (OpenBLAS and ATLAS) as well as an established commercial product (Intel MKL).
Anatomy of high-performance matrix multiplication
We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified
A set of level 3 basic linear algebra subprograms
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable
, and Robert A . van de Geijn . Strassen ’ s algorithm reloaded
  • Proceedings of the International Conference for High Performance Computing , Networking , Storage and Analysis , SC ’ 16 , pages