Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations

Abstract

In the post-genomic era there has been an explosion in the amount of genomic data available and the primary research problems have moved from being able to produce interesting biological data to being able to efficiently process and store this information. In this paper we present efficient data structures and algorithms for the High Similarity Sequencing Problem. In the High Similarity Sequencing Problem we are given the sequences S0, S1, . . . , Sk where Sj = ej1Iσ1ej2Iσ2ej3Iσ3 , . . . , ej`Iσ` and must perform pattern matching on the set of sequences. In this paper we present time and memory efficient datastructures by exploiting their extensive similarity, our solution leads to a query time of O(m + vk log ` + moccvv w + PSC(p)m w ) with a memory usage of O(N logN + vk log vk).

DOI: 10.1007/978-3-642-33412-2_60

Extracted Key Phrases

Cite this paper

@inproceedings{Alatabbi2012QueryingHS, title={Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations}, author={Ali Alatabbi and Carl Barton and Costas S. Iliopoulos and Laurent Mouchard}, booktitle={AIAI}, year={2012} }