Comparison of text-based methods for detecting duplication in document image databases

@inproceedings{Lopresti2000ComparisonOT,
  title={Comparison of text-based methods for detecting duplication in document image databases},
  author={Daniel P. Lopresti},
  booktitle={Document Recognition and Retrieval},
  year={2000}
}
This paper presents an experimental evaluation of several text-based methods for detecting duplication in document image databases using uncorrected OCR output. This task is challenging because of both the wide range of degradations printed documents can suffer, and conflicting interpretations of what it means to be a 'duplicate." We report results for five sets of experiments exploring various aspects of the problem space. While the techniques studied are generally robust in the face of most… CONTINUE READING
7 Extracted Citations
3 Extracted References
Similar Papers

Referenced Papers

Publications referenced by this paper.
Showing 1-3 of 3 references

Former Army operations officers assist DoD's search

  • G. G. Gilmore
  • Army Link News, December
  • 1997

The detection of duplicates in document image databases

  • D. Ioermann, H. Li, 0. Kia
  • In Proceedings of Ihe Fourth Inerna1onal…
  • 1997
2 Excerpts

Similar Papers

Loading similar papers…