Corpus ID: 237504755

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search

  title={Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search},
  author={Hongwu Peng and Shiyang Chen and Zhepeng Wang and Junhuan Yang and Scott Weitze and Tong Geng and Ang Li and Jinbo Bi and Minghu Song and Weiwen Jiang and Hang Liu and Caiwen Ding},
  • Hongwu Peng, Shiyang Chen, +9 authors Caiwen Ding
  • Published 13 September 2021
  • Computer Science
  • ArXiv
Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and… Expand


Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search
This paper implements Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng, and demonstrates that the graph-based methods consistently achieve the best trade-off between searching effectiveness and searching efficiencies. Expand
Efficient Large-Scale Approximate Nearest Neighbor Search on OpenCL FPGA
The proposed method significantly outperforms state-of-the-art methods on CPU and GPU for high dimensional nearest neighbor queries on billion-scale datasets in terms of query time and accuracy regardless of the batch size. Expand
Approximate Similarity Search with FAISS Framework Using FPGAs on the Cloud
This paper describes and implements a novel design based on a hardware accelerated approximate KNN algorithm built upon FAISS framework using FPGA-OpenCL platforms on the cloud and shows how the persistent index build times on big scale inputs for similarity search can be handled in hardware and even outperform other high performance systems. Expand
Billion-Scale Similarity Search with GPUs
This paper proposes a novel design for an inline-formula that enables the construction of a high accuracy, brute-force, approximate and compressed-domain search based on product quantization, and applies it in different similarity search scenarios. Expand
Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time
The theoretical considerations and experiments show that this pruning approach can provide linear speedups of one or more orders of magnitude in the case of searches with a fixed threshold, and achieve sublinear speedups in the range of O(|D|0.6) for the top K hits in current large databases. Expand
Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning
This paper investigates the column balanced block-wise pruning on Transformer and designs an FPGA acceleration engine to customize the balanced blockwise matrix multiplication. Expand
Reoptimization of MDL Keys for Use in Drug Discovery
Improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity are reported on and the use of genetic algorithms in the selection of optimal keysets is explored. Expand
Hashing Algorithms and Data Structures for Rapid Searches of Fingerprint Vectors
It is shown how one can rapidly compute a bound on the Jaccard-Tanimoto similarity measure of the two corresponding fingerprints, using the intersection bound, which allows one to significantly prune the search space by discarding molecules associated with unfavorable bounds. Expand
The chemfp project
  • A. Dalke
  • Medicine, Computer Science
  • Journal of Cheminformatics
  • 2019
The chemfp project has had four main goals: (1) promote the FPS format as a text-based exchange format for dense binary cheminformatics fingerprints, (2) develop a high-performance implementation ofExpand
Engineering Efficient and Effective Non-metric Space Library
A new similarity search library is presented and it is adopted a position that engineering is equally important to design of the algorithms and pursue a goal of producing realistic benchmarks, which supports this point of view. Expand