Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor

@article{Herruzo2020AcceleratingSA,
  title={Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor},
  author={Jose M. Herruzo and Sonia Gonz{\'a}lez-Navarro and Pablo Ib{\'a}{\~n}ez-Mar{\'i}n and V{\'i}ctor Vi{\~n}als-Y{\'u}fera and Jes{\'u}s Alastruey-Bened{\'e} and Oscar G. Plata},
  journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
  year={2020},
  volume={17},
  pages={1093-1104}
}
FM-index is a compact data structure suitable for fast matches of short reads to large reference genomes. The matching algorithm using this index exhibits irregular memory access patterns that cause frequent cache misses, resulting in a memory bound problem. This paper analyzes different FM-index versions presented in the literature, focusing on those computing aspects related to the data access. As a result of the analysis, we propose a new organization of FM-index that minimizes the demand… 
Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps
TLDR
This work proposes COFI, a COmpressed FM-Index for large K-steps, which enables a 15-step FM-index using less than 16 GB for a human genome reference of 3 giga base pairs and achieves average speed-ups of 1.46× and 1.39×, respectively.
Enabling fast and energy-efficient FM-index exact matching using processing-near-memory
TLDR
A performance and energy evaluation of two classes of processor architectures when executing the FM-index exact matching algorithm, as a reference algorithm for exact sequence alignment, based on complex cores and DDR3/4 SDRAM memory technology.
Genome Sequence Alignment - Design Space Exploration for Optimal Performance and Energy Architectures
TLDR
This work proposes an architecture based on ARMv8 cores and demonstrates that 16 ARM v8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory.
Multiprocess Implementation of DNA Pre-alignment Filtering using the Bit Matrix Algorithm
TLDR
This manuscript focuses on the bit matrix pre-alignment filter with a goal of improving the implementation by using multiprocessing techniques that exploit multi-core CPUs.
LISA: Learned Indexes for Sequence Analysis
TLDR
This paper introduces LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search that achieves up to 2.2 and 10.8 speedups over the state-of-the-art FM-index based implementations for exact search and super-maximal exact match (SMEM) search, respectively.
A survey on evaluating and optimizing performance of Intel Xeon Phi
TLDR
A survey of works that study the architecture of Phi and use it as an accelerator for a broad range of applications and performance optimization strategies as well as the factors that bottleneck the performance of Phi are presented.

References

SHOWING 1-10 OF 45 REFERENCES
Boosting the FM-Index on the GPU: Effective Techniques to Mitigate Random Memory Access
TLDR
This work shows that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses.
n-step FM-Index for Faster Pattern Matching
An Exact Matching Approach for High Throughput Sequencing Based on BWT and GPUs
  • Su Chen, Hai Jiang
  • Computer Science
    2011 14th IEEE International Conference on Computational Science and Engineering
  • 2011
TLDR
This paper analyzes Burrows-Wheeler Transformation (BTW) thoroughly and several optimizations are proposed for exact sequence matching, and efficient High Throughput Sequencing (HTS) models on both CPU and GPU are developed for practical applications.
FHAST: FPGA-Based Acceleration of Bowtie in Hardware
TLDR
FHAST (FPGA hardware accelerated sequence-matching tool), a drop-in replacement for BOWTIE that uses a hardware design based on field programmable gate arrays (FP GA) that masks memory latency by executing multiple concurrent hardware threads accessing memory simultaneously.
FM-Index on GPU: A Cooperative Scheme to Reduce Memory Footprint
TLDR
Here it is shown that the combination of a compact design of the FM-index and a thread-cooperative approach can be used to restore a proper balance and allow full exploitation of the computational resources of the GPU across several GPU architectures.
Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++
TLDR
This work presents an efficient parallelization approach for NGS short-read alignment on multi-core clusters that takes advantage of a distributed shared memory programming model based on the new UPC++ language.
Compressed indexing and local alignment of DNA
TLDR
This article shows how to build a software called BWT-SW that exploits a BWT index of a text T to speed up the dynamic programming for finding all local alignments, which is the first practical tool that can find all localalignments.
BarraCUDA - a fast short read sequence aligner using graphics processing units
TLDR
The implementation of BarraCUDA is described, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence to take advantage of the massive parallelism of GPU.
HISAT: a fast spliced aligner with low memory requirements
TLDR
Tests showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method, and requires only 4.3 gigabytes of memory.
Knights Landing: Second-Generation Intel Xeon Phi Product
This article describes the architecture of Knights Landing, the second-generation Intel Xeon Phi product family, which targets high-performance computing and other highly parallel workloads. It
...
1
2
3
4
5
...