Improved robustness of signature-based near-replica detection via lexicon randomization

@inproceedings{Kolcz2004ImprovedRO,
  title={Improved robustness of signature-based near-replica detection via lexicon randomization},
  author={Aleksander Kolcz and Abdur Chowdhury and Joshua Alspector},
  booktitle={KDD},
  year={2004}
}
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional duplicate detection techniques relying on direct inter-document similarity computation (e.g., using the cosine measure) are often not feasible given the time and memory performance constraints. On the other hand, fingerprint-based methods, such as I-Match, are very attractive computationally but may be brittle with… CONTINUE READING
Highly Cited
This paper has 94 citations. REVIEW CITATIONS
56 Citations
3 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-10 of 56 extracted citations

94 Citations

01020'07'10'13'16
Citations per Year
Semantic Scholar estimates that this publication has 94 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-3 of 3 references

Similar Papers

Loading similar papers…