Distributional regimes for the number of k-word matches between two random sequences.

@article{Lippert2002DistributionalRF,
  title={Distributional regimes for the number of k-word matches between two random sequences.},
  author={Ross Lippert and Haiyan Huang and Michael S. Waterman},
  journal={Proceedings of the National Academy of Sciences of the United States of America},
  year={2002},
  volume={99 22},
  pages={13980-9}
}
When comparing two sequences, a natural approach is to count the number of k-letter words the two sequences have in common. No positional information is used in the count, but it has the virtue that the comparison time is linear with sequence length. For this reason this statistic D(2) and certain transformations of D(2) are used for EST sequence database searches. In this paper we begin the rigorous study of the statistical distribution of D(2). Using an independence model of DNA sequences, we… CONTINUE READING
Highly Influential
This paper has highly influenced a number of papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 144 citations. REVIEW CITATIONS

7 Figures & Tables

Topics

Statistics

0102030'05'07'09'11'13'15'17
Citations per Year

145 Citations

Semantic Scholar estimates that this publication has 145 citations based on the available data.

See our FAQ for additional information.