• Corpus ID: 30747376

Performance Optimization for the K-Nearest Neighbors Kernel using Strassen ’ s Algorithm

  title={Performance Optimization for the K-Nearest Neighbors Kernel using Strassen ’ s Algorithm},
  author={Leslie Rice},
Strassen’s algorithm is an algorithm for computing matrix-matrix multiplication using only 7 multiplications rather than the usual 8. Recent advances have shown the benefit of using Strassen’s algorithm to improve the performance of general matrix-matrix multiplication (GEMM) for matrices of varying shapes and sizes. These advances have created an opportunity to incorporate Strassen’s algorithm in other matrixmatrix multiplication like operations. In this paper, we do so for the GSKNN (General… 

Figures from this paper


Performance optimization for the k-nearest neighbors kernel on x86 architectures
This work proposes an efficient implementation and performance analysis for the kNN kernel on x86 architectures, and presents an analysis of the algorithm and explains parameter selection, and observes significant speedups when searching for 16 neighbors in a point dataset with 1.6 million points in 64 dimensions.
Strassen's Algorithm Reloaded
The practical implementation of Strassen's algorithm for matrix-matrix multiplication (DGEMM) requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations and can be plug-compatible with the standard DG EMM interface.
Gaussian elimination is not optimal
t. Below we will give an algorithm which computes the coefficients of the product of two square matrices A and B of order n from the coefficients of A and B with tess than 4 . 7 n l°g7 arithmetical
A set of level 3 basic linear algebra subprograms
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrix-vector operations that should provide for efficient and portable
Generating Families of Practical Fast Matrix Multiplication Algorithms
This study shows that Strassen-like fast matrix multiplication can be incorporated into libraries for practical use and demonstrates a performance benefit over conventional GEMM on single core and multi-core systems.
Anatomy of high-performance matrix multiplication
We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified
BLIS: A Framework for Rapidly Instantiating BLAS Functionality
Preliminary performance of level-2 and level-3 operations is observed to be competitive with two mature open source libraries (OpenBLAS and ATLAS) as well as an established commercial product (Intel MKL).
A framework for practical parallel fast matrix multiplication
  • Austin R. BensonGrey Ballard
  • Computer Science
    Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • 2015
It is shown that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and Strassen's fast algorithm on modest problem sizes and shapes and that the best choice of fast algorithm depends not only on the size of the matrices but also the shape.
, and Robert A . van de Geijn . Strassen ’ s algorithm reloaded
  • Proceedings of the International Conference for High Performance Computing , Networking , Storage and Analysis , SC ’ 16 , pages