Performance optimization for the k-nearest neighbors kernel on x86 architectures

  title={Performance optimization for the k-nearest neighbors kernel on x86 architectures},
  author={Chenhan D. Yu and Jianyu Huang and Woody Austin and Bo Xiao and George Biros},
  journal={SC15: International Conference for High Performance Computing, Networking, Storage and Analysis},
  • Chenhan D. Yu, Jianyu Huang, G. Biros
  • Published 15 November 2015
  • Computer Science
  • SC15: International Conference for High Performance Computing, Networking, Storage and Analysis
Nearest neighbor search is a cornerstone problem in computational geometry, non-parametric statistics, and machine learning. For N points, exhaustive search requires quadratic work, but many fast algorithms reduce the complexity for exact and approximate searches. The common kernel (kNN kernel) in all these algorithms solves many small-size problems exactly using exhaustive search. We propose an efficient implementation and performance analysis for the kNN kernel on x86 architectures. By fusing… 

Figures and Tables from this paper

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures
This paper presents parallel and highly optimized kd*tree based KNN algorithms (both construction and querying) suitable for distributed architectures and outperforms earlier implementations by more than order of magnitude, thereby radically improving the applicability of the implementation to state-of-the-art Big Data analytics problems.
Performance Optimization for the K-Nearest Neighbors Kernel using Strassen ’ s Algorithm
This paper incorporates Strassen’s algorithm for matrixmatrix multiplication using only 7 multiplications rather than the usual 8 for the GSKNN (General Stride k Nearest Neighbors) operation, a kernel for nearest neighbor search, and results are a performance speedup for highdimensional datasets.
NCAM: Near-Data Processing for Nearest Neighbor Search
To enable large-scale computer vision, a new class of associative memories called NCAMs are proposed which encapsulate logic with memory to accelerate k-nearest neighbors and can improve the performance of kNN by orders of magnitude.
A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression
Results confirm that — with correct tuning of the hyperparameters — classification using kernel ridge regression with the compressed matrix does not lose prediction accuracy compared to the exact — not compressed — kernel matrix and that the approach can be extended to O(1M) datasets, for which computation with the full kernel matrix becomes prohibitively expensive.
Vector and Line Quantization for Billion-scale Similarity Search on GPUs
Scalable Approximate FRNN-OWA Classification
This work proposes approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbors returned by hierarchical navigable small worlds (HNSW), a recent approximative nearest neighbor search algorithm with logarithmic query time complexity at constant near-100% accuracy.
An improved K-Nearest neighbour with grasshopper optimization algorithm for imputation of missing data
A novel method for imputation of missing data, named KNNGOA, which optimized the KNN imputation technique based on the grasshopper optimization algorithm, and has achieved promising results from the experiment conducted, which outperformed other methods, especially in terms of accuracy.
A kernel-independent FMM in general dimensions
A general-dimensional, kernel-independent, algebraic fast multipole method that only requires kernel evaluations and scales well with the problem size, the number of processors, and the ambient dimension---as long as the intrinsic dimension of the dataset is small.
Optimizing GPGPU Kernel Summation for Performance and Energy Efficiency
This paper decomposes the kernel summation problem into individual tasks with few dependencies and strike a balance between finer grained parallelism and reduced data replication, and achieves a speedup of up to 1.8X, and saves up to 33% of total energy in all tested problem sizes.
Strassen’s Algorithm Reloaded on GPUs
A performance model for NVIDIA Volta GPUs is developed to select the appropriate blocking parameters and predict the performance for gemm and Strassen, and it is developed that can achieve up to 1.11× speedup with a crossover point as small as 1,536 compared to cublasSgemm on a NVIDIA Tesla V100 GPU.


Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
Two solutions for both nearest neighbors and range search problems and a scheme that learns the parameters in a learning stage adopting them to the case of a set of points with low intrinsic dimension that are embedded in high dimensional space are proposed.
Scalable Nearest Neighbor Algorithms for High Dimensional Data
  • Marius Muja, D. Lowe
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2014
It is shown that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and an automated configuration procedure for finding the best algorithm to search a particular data set is described.
Fast k nearest neighbor search using GPU
A CUDA implementation of the ldquobrute forcerdquo kNN search and it is shown a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasi-linear behavior with respect to the data size in a given, practical range.
Parallel search of k-nearest neighbors with synchronous operations
A cohort of truncated sort algorithms for parallel kNN search, including the truncated bitonic sort (TBiS) in particular, which has desirable data locality, synchronous concurrency and simple data and program structures.
ANN: library for approximate nearest neighbor searching
ANN is a library of C++ objects and procedures that supports approximate nearest neighbor searching, and is written as a testbed for a class of nearest neighbour searching algorithms, particularly those based on orthogonal decompositions of space.
Randomized approximate nearest neighbors algorithm
A randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space that utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search is presented.
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
  • Alexandr Andoni, P. Indyk
  • Computer Science
    2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • 2006
We present an algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(dn + n1+1c2/+o(1)). This almost matches
Poster: parallel algorithms for clustering and nearest neighbor search problems in high dimensions
This work seeks to develop a set of algorithms that will provide unprecedented scalability and performance for clustering and nearest neighbor searches in high dimensions.
Efficient implementation of sorting on multi-core SIMD CPU architecture
An efficient implementation and detailed analysis of MergeSort on current CPU architectures, and performance scalability of the proposed sorting algorithm with respect to certain salient architectural features of modern chip multiprocessor (CMP) architectures, including SIMD width and core-count.
The influence of caches on the performance of heaps
This paper investigates the cache performance of implicit heaps and presents an analytical model called collective analysis that allows cache performance to be predicted as a function of both cacheconfiguration and algorithm configuration.