Common Sequence Structure Properties and Stable Regions in RNA Secondary Structures
Arc-annotated sequences are useful in representing the structural information of RNA sequences. In general, RNA secondary and tertiary structures can be represented as a set of nested arcs and a set of crossing arcs, respectively. Since RNA functions are largely determined by molecular confirmation and therefore secondary and tertiary structures, the comparison between RNA secondary and tertiary structures has received much attention recently. In this paper, we propose the notion of edit distance to measure the similarity between two RNA secondary and tertiary structures, by incorporating various edit operations performed on both bases and arcs (i.e., base-pairs). Several algorithms are presented to compute the edit distance between two RNA sequences with various arc structures and under various score schemes, either exactly or approximately, with provably good performance. Preliminary experimental tests confirm that our definition of edit distance and the computation model are among the most reasonable ones ever studied in the literature.