GPU-Accelerated BWA-MEM Genomic Mapping Algorithm Using Adaptive Load Balancing

  title={GPU-Accelerated BWA-MEM Genomic Mapping Algorithm Using Adaptive Load Balancing},
  author={Ernst Houtgast and Vlad Mihai Sima and Koen Bertels and Zaid Al-Ars},
Genomic sequencing is rapidly becoming a premier generator of Big Data, posing great computational challenges. [] Key Result This provides, compared to not using load balancing, upi¾źto +46i¾ź% more performance.

An Efficient GPU-Accelerated Implementation of Genomic Short Read Mapping with BWA-MEM

A GPU-accelerated implementation of BWA-MEM is proposed, which obtains a twofold overall application-level speedup, which is the maximum theoretically achievable speedup.

Power-efficiency analysis of accelerated BWA-MEM implementations on heterogeneous computing platforms

Power-efficiency of the BWA-MEM algorithm, a popular tool for genomic data mapping, is studied on two heterogeneous architectures and the base pairs per Joule unit is introduced as a measure of power-efficiency.

Improving Performance of Genomic Aligners on Intel Xeon Phi-Based Architectures

  • Shaolong ChenM. A. Senar
  • Computer Science
    2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
  • 2018
A multi-level strategy (MDPR) based on data parallelization and data replication which can be easily extrapolated to other sequence alignment tools that have similar operating principles with those of BWA aligner is proposed.

MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm

A practical, energy efficient, Dual-Inline Memory Module (DIMM) based, NDP Accelerator for DNA Seeding Algorithm (MEDAL), which is based on off-the-shelf DRAM components and an algorithm-specific data compression technique to reduce memory footprint, introduce more space for the data mapping, and reduce the communication overhead is proposed.

Power-Efficient Accelerated Genomic Short Read Mapping on Heterogeneous Computing Platforms

A novel FPGA-accelerated BWA-MEM implementation, a popular tool for genomic data mapping, is proposed with a two-fold speedup in overall application-level performance and a 1.6x gain in power-efficiency.

Comparative Analysis of System-Level Acceleration Techniques in Bioinformatics: A Case Study of Accelerating the Smith-Waterman Algorithm for BWA-MEM

Three accelerated implementations of the widely used BWA-MEM genomic mapping tool are compared as a case study on design-time optimization for heterogeneous architectures, each using an optimized Smith-Waterman algorithm implementation.

On Hardware-Accelerated Maximally-Efficient Systolic Arrays: Acceleration and Optimization of Genomics Pipelines Through Hardware/Software Co-Design

Various techniques to improve the efficiency of systolic arrays for short sequence lengths are proposed, including the Variable Logical Length, the Variable Physical Length, and the Variablelogical and Physical Length systolics array architectures are proposed to eliminate the dependence of syStolic array efficiency on read sequence length.

Accelerated HaplotypeCaller DNA Analysis Application on FPGAs Using CAPI

A fast and efficient implementation of a Field Programmable Gate Array (FPGA) based, streaming multicore architecture for accelerating variant calling algorithms will be designed, focused on the HaplotypeCaller which is the variant calling software part of the Genome Analysis Toolkit (GATK).



An FPGA-based systolic array to accelerate the BWA-MEM genomic mapping algorithm

This work presents the first accelerated implementation of BWA-MEM, a popular genome sequence alignment algorithm widely used in next generation sequencing genomics pipelines, and proposes and evaluates a number of FPGA-based systolic array architectures, presenting optimizations generally applicable to variable length Smith-Waterman execution.

Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm

This paper presents an accelerated version of BWA-MEM, one of the most popular read alignment algorithms, by implementing a heterogeneous hardware/software optimized version on the Convey HC2ex platform.

DOPA: GPU-based protein alignment using database and memory access optimizations

This paper presents a high performance protein sequence alignment implementation for Graphics Processing Units (GPUs) and it achieves a performance of 21.4 Giga Cell Updates Per Second (GCUPS), which is 1.13 times better than the fastest GPU implementation to date.

CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions

This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs.

StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

StarPU is an original runtime system providing a high‐level, unified execution model tightly coupled with an expressive data management library and it is shown that the dynamic approach competes with the highly optimized MAGMA library and overcomes the limitations of the corresponding static scheduling in a portable way.

Bio-sequence database scanning on a GPU

A new approach to bio-sequence database scanning using computer graphics hardware to gain high performance at low cost and reformulated the Smith-Waterman dynamic programming algorithm in terms of computer graphics primitives.

A Smith-Waterman Systolic Cell

In this paper, an improved systolic processing element cell for implementing the Smith-Waterman on a Xilinx Virtex FPGA is presented.

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM

BWA-MEM automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment, which is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases.

An analytical framework for optimizing variant discovery from personal genomes

The genome comparison and analytic testing (GCAT) platform is presented to facilitate development of performance metrics and comparisons of analysis tools across these metrics, with support for data slicing and filtering.

Fast gapped-read alignment with Bowtie 2

Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.