# Performance optimization for the k-nearest neighbors kernel on x86 architectures

@article{Yu2015PerformanceOF, title={Performance optimization for the k-nearest neighbors kernel on x86 architectures}, author={Chenhan D. Yu and Jianyu Huang and Woody Austin and Bo Xiao and George Biros}, journal={SC15: International Conference for High Performance Computing, Networking, Storage and Analysis}, year={2015}, pages={1-12} }

Nearest neighbor search is a cornerstone problem in computational geometry, non-parametric statistics, and machine learning. For N points, exhaustive search requires quadratic work, but many fast algorithms reduce the complexity for exact and approximate searches. The common kernel (kNN kernel) in all these algorithms solves many small-size problems exactly using exhaustive search. We propose an efficient implementation and performance analysis for the kNN kernel on x86 architectures. By fusing…

## Figures and Tables from this paper

## 27 Citations

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

- Computer Science2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- 2016

This paper presents parallel and highly optimized kd*tree based KNN algorithms (both construction and querying) suitable for distributed architectures and outperforms earlier implementations by more than order of magnitude, thereby radically improving the applicability of the implementation to state-of-the-art Big Data analytics problems.

Performance Optimization for the K-Nearest Neighbors Kernel using Strassen ’ s Algorithm

- Computer Science
- 2017

This paper incorporates Strassen’s algorithm for matrixmatrix multiplication using only 7 multiplications rather than the usual 8 for the GSKNN (General Stride k Nearest Neighbors) operation, a kernel for nearest neighbor search, and results are a performance speedup for highdimensional datasets.

NCAM: Near-Data Processing for Nearest Neighbor Search

- Computer ScienceMEMSYS
- 2015

To enable large-scale computer vision, a new class of associative memories called NCAMs are proposed which encapsulate logic with memory to accelerate k-nearest neighbors and can improve the performance of kNN by orders of magnitude.

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

- Computer Science2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
- 2018

Results confirm that — with correct tuning of the hyperparameters — classification using kernel ridge regression with the compressed matrix does not lose prediction accuracy compared to the exact — not compressed — kernel matrix and that the approach can be extended to O(1M) datasets, for which computation with the full kernel matrix becomes prohibitively expensive.

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

- Computer ScienceFuture Gener. Comput. Syst.
- 2019

Scalable Approximate FRNN-OWA Classification

- Computer ScienceIEEE Transactions on Fuzzy Systems
- 2020

This work proposes approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbors returned by hierarchical navigable small worlds (HNSW), a recent approximative nearest neighbor search algorithm with logarithmic query time complexity at constant near-100% accuracy.

An improved K-Nearest neighbour with grasshopper optimization algorithm for imputation of missing data

- Computer ScienceInternational Journal of Advances in Intelligent Informatics
- 2021

A novel method for imputation of missing data, named KNNGOA, which optimized the KNN imputation technique based on the grasshopper optimization algorithm, and has achieved promising results from the experiment conducted, which outperformed other methods, especially in terms of accuracy.

A kernel-independent FMM in general dimensions

- Computer ScienceSC15: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2015

A general-dimensional, kernel-independent, algebraic fast multipole method that only requires kernel evaluations and scales well with the problem size, the number of processors, and the ambient dimension---as long as the intrinsic dimension of the dataset is small.

Optimizing GPGPU Kernel Summation for Performance and Energy Efficiency

- Computer Science2016 45th International Conference on Parallel Processing Workshops (ICPPW)
- 2016

This paper decomposes the kernel summation problem into individual tasks with few dependencies and strike a balance between finer grained parallelism and reduced data replication, and achieves a speedup of up to 1.8X, and saves up to 33% of total energy in all tested problem sizes.

Strassen’s Algorithm Reloaded on GPUs

- Computer ScienceACM Trans. Math. Softw.
- 2020

A performance model for NVIDIA Volta GPUs is developed to select the appropriate blocking parameters and predict the performance for gemm and Strassen, and it is developed that can achieve up to 1.11× speedup with a crossover point as small as 1,536 compared to cublasSgemm on a NVIDIA Tesla V100 GPU.

## References

SHOWING 1-10 OF 43 REFERENCES

Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

- Computer Science2013 IEEE International Conference on Computer Vision
- 2013

Two solutions for both nearest neighbors and range search problems and a scheme that learns the parameters in a learning stage adopting them to the case of a set of points with low intrinsic dimension that are embedded in high dimensional space are proposed.

Scalable Nearest Neighbor Algorithms for High Dimensional Data

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2014

It is shown that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and an automated configuration procedure for finding the best algorithm to search a particular data set is described.

Fast k nearest neighbor search using GPU

- Computer Science2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
- 2008

A CUDA implementation of the ldquobrute forcerdquo kNN search and it is shown a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasi-linear behavior with respect to the data size in a given, practical range.

Parallel search of k-nearest neighbors with synchronous operations

- Computer Science2012 IEEE Conference on High Performance Extreme Computing
- 2012

A cohort of truncated sort algorithms for parallel kNN search, including the truncated bitonic sort (TBiS) in particular, which has desirable data locality, synchronous concurrency and simple data and program structures.

ANN: library for approximate nearest neighbor searching

- Computer Science
- 1998

ANN is a library of C++ objects and procedures that supports approximate nearest neighbor searching, and is written as a testbed for a class of nearest neighbour searching algorithms, particularly those based on orthogonal decompositions of space.

Randomized approximate nearest neighbors algorithm

- Computer ScienceProceedings of the National Academy of Sciences
- 2011

A randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space that utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search is presented.

Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

- Computer Science2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

We present an algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(dn + n1+1c2/+o(1)). This almost matches…

Poster: parallel algorithms for clustering and nearest neighbor search problems in high dimensions

- Computer ScienceSC '11 Companion
- 2011

This work seeks to develop a set of algorithms that will provide unprecedented scalability and performance for clustering and nearest neighbor searches in high dimensions.

Efficient implementation of sorting on multi-core SIMD CPU architecture

- Computer ScienceProc. VLDB Endow.
- 2008

An efficient implementation and detailed analysis of MergeSort on current CPU architectures, and performance scalability of the proposed sorting algorithm with respect to certain salient architectural features of modern chip multiprocessor (CMP) architectures, including SIMD width and core-count.

The influence of caches on the performance of heaps

- Computer ScienceJEAL
- 1996

This paper investigates the cache performance of implicit heaps and presents an analytical model called collective analysis that allows cache performance to be predicted as a function of both cacheconfiguration and algorithm configuration.