Mapping sequences by parts

Abstract

We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. We introduce an algorithm computing an optimal N-map with time complexity O (|s| × |t| × N) using O (|s| × |t| × N) memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.

DOI: 10.1186/1748-7188-2-11

3 Figures and Tables

Cite this paper

@article{Didier2007MappingSB, title={Mapping sequences by parts}, author={Gilles Didier and Carito Guziolowski}, journal={Algorithms for molecular biology : AMB}, year={2007}, volume={2}, pages={11 - 11} }