Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor
@article{Herruzo2020AcceleratingSA,
title={Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor},
author={Jose M. Herruzo and Sonia Gonz{\'a}lez-Navarro and Pablo Ib{\'a}{\~n}ez-Mar{\'i}n and V{\'i}ctor Vi{\~n}als-Y{\'u}fera and Jes{\'u}s Alastruey-Bened{\'e} and Oscar G. Plata},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
year={2020},
volume={17},
pages={1093-1104}
}FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in a memory bound problem. This paper analyzes different FM-index versions presented in the literature, focusing on those computing aspects related to the data access. As a result of the analysis, we propose a new organization of FM-index that minimizes the demand…
Figures, Tables, and Topics from this paper
6 Citations
Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps
- Computer Science, BiologyIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2022
This work proposes COFI, a COmpressed FM-Index for large K-steps, which enables a 15-step FM-index using less than 16 GB for a human genome reference of 3 giga base pairs and achieves average speed-ups of 1.46× and 1.39×, respectively.
Enabling fast and energy-efficient FM-index exact matching using processing-near-memory
- Computer ScienceJ. Supercomput.
- 2021
A performance and energy evaluation of two classes of processor architectures when executing the FM-index exact matching algorithm, as a reference algorithm for exact sequence alignment, based on complex cores and DDR3/4 SDRAM memory technology.
Genome Sequence Alignment - Design Space Exploration for Optimal Performance and Energy Architectures
- Computer ScienceIEEE Transactions on Computers
- 2021
This work proposes an architecture based on ARMv8 cores and demonstrates that 16 ARM v8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory.
Multiprocess Implementation of DNA Pre-alignment Filtering using the Bit Matrix Algorithm
- Computer Science2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM)
- 2020
This manuscript focuses on the bit matrix pre-alignment filter with a goal of improving the implementation by using multiprocessing techniques that exploit multi-core CPUs.
LISA: Learned Indexes for Sequence Analysis
- Computer Science
- 2020
This paper introduces LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search that achieves up to 2.2 and 10.8 speedups over the state-of-the-art FM-index based implementations for exact search and super-maximal exact match (SMEM) search, respectively.
A survey on evaluating and optimizing performance of Intel Xeon Phi
- Computer ScienceConcurr. Comput. Pract. Exp.
- 2020
A survey of works that study the architecture of Phi and use it as an accelerator for a broad range of applications and performance optimization strategies as well as the factors that bottleneck the performance of Phi are presented.
References
SHOWING 1-10 OF 45 REFERENCES
Boosting the FM-Index on the GPU: Effective Techniques to Mitigate Random Memory Access
- Computer ScienceIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2015
This work shows that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses.
An Exact Matching Approach for High Throughput Sequencing Based on BWT and GPUs
- Computer Science2011 14th IEEE International Conference on Computational Science and Engineering
- 2011
This paper analyzes Burrows-Wheeler Transformation (BTW) thoroughly and several optimizations are proposed for exact sequence matching, and efficient High Throughput Sequencing (HTS) models on both CPU and GPU are developed for practical applications.
FHAST: FPGA-Based Acceleration of Bowtie in Hardware
- Computer ScienceIEEE/ACM Transactions on Computational Biology and Bioinformatics
- 2015
FHAST (FPGA hardware accelerated sequence-matching tool), a drop-in replacement for BOWTIE that uses a hardware design based on field programmable gate arrays (FP GA) that masks memory latency by executing multiple concurrent hardware threads accessing memory simultaneously.
FM-Index on GPU: A Cooperative Scheme to Reduce Memory Footprint
- Computer Science2014 IEEE International Symposium on Parallel and Distributed Processing with Applications
- 2014
Here it is shown that the combination of a compact design of the FM-index and a thread-cooperative approach can be used to restore a proper balance and allow full exploitation of the computational resources of the GPU across several GPU architectures.
Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++
- Computer SciencePloS one
- 2016
This work presents an efficient parallelization approach for NGS short-read alignment on multi-core clusters that takes advantage of a distributed shared memory programming model based on the new UPC++ language.
Compressed indexing and local alignment of DNA
- Computer ScienceBioinform.
- 2008
This article shows how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments, which is the first practical tool that can find all localalignments.
BarraCUDA - a fast short read sequence aligner using graphics processing units
- Computer ScienceBMC Research Notes
- 2011
The implementation of BarraCUDA is described, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence to take advantage of the massive parallelism of GPU.
HISAT: a fast spliced aligner with low memory requirements
- BiologyNature Methods
- 2015
Tests showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method, and requires only 4.3 gigabytes of memory.
Knights Landing: Second-Generation Intel Xeon Phi Product
- Computer ScienceIEEE Micro
- 2016
This article describes the architecture of Knights Landing, the second-generation Intel Xeon Phi product family, which targets high-performance computing and other highly parallel workloads. It…
















