A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search

  title={A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search},
  author={Mengzhao Wang and Xiaoliang Xu and Qiang Yue and Yuxiang Wang},
  journal={Proc. VLDB Endow.},
Approximate nearest neighbor search (ANNS) constitutes an important operation in a multitude of applications, including recommendation systems, information retrieval, and pattern recognition. In the past decade, graph-based ANNS algorithms have been the leading paradigm in this domain, with dozens of graph-based ANNS algorithms proposed. Such algorithms aim to provide effective, efficient solutions for retrieving the nearest neighbors for a given query. Nevertheless, these efforts focus on… 

FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search

This paper presents the first graph-based ANNS index that reflects corpus updates into the index in real-time without compromising on search performance, and designs FreshDiskANN, a system that can index over a billion points on a workstation with an SSD and limited memory.

Tao: A Learning Framework for Adaptive Nearest Neighbor Search using Static Features Only

Tao, a general learning framework for Terminating ANN queries Adaptively using Only static features is developed, which achieves up to 2.69x speedup even compared to its counterpart, at the same high accuracy targets.

A new compressed cover tree guarantees a near linear parameterized complexity for all $k$-nearest neighbors search in metric spaces

This paper describes typical examples when past cover trees need O(n) iterations so that the overall worst-time complexity remains quadratic as for a brute-force search.

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

This work provides a comprehensive review of recent works focusing on utilizing DRL to improve data processing and analytics, and presents an introduction to key concepts, theories, and methods in DRL.

VStore: in-storage graph based vector search accelerator

VStore is presented, a graph-based vector search solution that collaboratively optimizes accuracy, latency, memory, and data movement on large-scale vector data based on in-storage computing and exhibits significant search efficiency improvement and energy reduction.

Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints

This paper proposes a native hybrid query (NHQ) framework based on proximity graph (PG), which provides the specialized composite index and joint pruning modules for hybrid queries, and presents two novel navigable PGs with optimized edge selection and routing strategies, which obtain better overall performance than existing PGs.

LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases

This paper proposes a learning-based k-ANN search method to reduce NDC and proposes a compressed GNN-graph to accelerate the neighbor ranking model and the initial node selection model, and proves that learning efficiency is improved without degrading the accuracy.

Survey on Exact kNN Queries over High-Dimensional Data Space

This paper focuses on exact kNN queries and presents a comprehensive survey of exact approaches over high-dimensional data space, which covers 20 kNN Search methods and 9 kNN Join methods and specifically categorise the algorithms based on indexing strategies, data and space partitioning techniques and the computing paradigm.

Automating Nearest Neighbor Search Configuration with Constrained Optimization

The approximate nearest neighbor (ANN) search problem is fundamental to efficiently serving many real-world machine learning applications. A number of techniques have been developed for ANN search



Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement

A comprehensive experimental evaluation of many state-of-the-art methods for approximate nearest neighbor search and proposes a new method that achieves both high query efficiency and high recall empirically on majority of the datasets under a wide range of settings.

Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph

A novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) is proposed which guarantees very low search complexity (close to logarithmic time) and is proposed to further lower the indexing complexity and make it practical for billion-node ANNS problems.

A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs

This paper introduces an algorithm to solve a nearest-neighbor query q by minimizing a kernel function defined by the distance from q to each object in the database, and provides two approaches to select edges in the graph's construction stage that limit memory footprint and reduce the number of free parameters simultaneously.

EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph

EFANNA is the fastest algorithm so far both on approximate nearest neighbor graph construction and approximate nearest neighbour search and Efanna nicely combines the advantages of hierarchical structure based methods and nearest-neighbor-graph based methods.

Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search

Inspired by the message transfer mechanism of the communication satellite system, a new family of MSNETs are found, namely the Satellite System Graphs (SSG), which inherits the superior ANNS properties from the MSNET and tries to ensure the angles between the edges to be no smaller than a given value.

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination

This work builds and train gradient boosting decision tree models to learn and predict when to stop searching for a certain query and applies the learned adaptive early termination to state-of-the-art ANN approaches, and evaluates the end-to-end performance on three million to billion-scale datasets.

Graph based Nearest Neighbor Search: Promises and Failures

The hierarchical structure could not achieve "much better logarithmic complexity scaling" as it was claimed in the original paper, particularly on high dimensional cases, and it is found that similar high search speed efficiency could be achieved with the support of flat k-NN graph after graph diversification.

Multiattribute approximate nearest neighbor search based on navigable small world graph

A novel approach for multiattribute ANNS based on navigable small world (NSW) graph, called MA‐NSW, which guarantees efficiency and it is defined in terms of arbitrary metric spaces (eg, Euclidean distance and cosine similarity).

Query-driven iterated neighborhood graph search for large scale indexing

This paper presents a criterion to check if the local search over a neighborhood graph arrives at the local solution, and follows the iterated local search (ILS) strategy, widely-used in combinatorial optimization, to find a solution beyond a local optimum.