An efficient algorithm for identifying matches with errors in multiple long molecular sequences.

@article{Leung1991AnEA,
  title={An efficient algorithm for identifying matches with errors in multiple long molecular sequences.},
  author={M. Y. Y. Leung and B. Edwin Blaisdell and Christopher Burge and Samuel Karlin},
  journal={Journal of molecular biology},
  year={1991},
  volume={221 4},
  pages={1367-78}
}
An efficient algorithm is described for finding matches, repeats and other word relations, allowing for errors, in large data sets of long molecular sequences. The algorithm entails hashing on fixed-size words in conjunction with the use of a linked list connecting all occurrences of the same word. The average memory and run time requirement both increase almost linearly with the total sequence length. Some results of the program's performance on a database of Escherichia coli DNA sequences are… CONTINUE READING