Parallelizing the Smith-Waterman Local Alignment Algorithm using CUDA

Abstract

Given two strings S1 = pqaxabcstrqrtp and S2 = xyaxbacsl, the substrings axabcs in S1 and axbacs in S2 are very similar. The problem of finding similar substrings is the local alignment problem. Local alignment is extensively used in computational biology to find regions of similarity in different biological sequences. Similar genetic sequences are identified by computing the local alignment of a given sequence against a number of other genetic sequences. Protein molecules fold into unique 3-dimensional shapes. Different regions fold into various shapes – helices, sheets etc. These shapes determine the function of the proteins. Local alignment helps identify the various regions of structural similarity. BLAST and FASTA are two of the programs that compute the local alignment of a sequence against a database of other genetic sequences. Formally, given a scoring scheme that includes a cost for matching a pair of characters and inserting a character in one sequence (equivalently, introducing a gap in the other sequence), a local alignment of strings S1 and S2 is a pair of substrings s1 of S1 and s2 of S2 whose score is maximum over all possible substrings of S1 and S2 for the scoring scheme. Unlike the global alignment problem where the entire strings are to be matched, the local alignment problem identifies highly similar substrings. Also, unlike the edit distance problem, where the goal is to minimize the cost of transforming one sequence to another, the local alignment problem identifies highly similar substrings.

Extracted Key Phrases

2 Figures and Tables

Cite this paper

@inproceedings{Venkatachalam2012ParallelizingTS, title={Parallelizing the Smith-Waterman Local Alignment Algorithm using CUDA}, author={Balaji Venkatachalam}, year={2012} }