The Detection of Duplicates in Document Image Databases

  title={The Detection of Duplicates in Document Image Databases},
  author={David S. Doermann and Huiping Li and Omid E. Kia},
Document imaging technology has developed to the point where it is not uncommon for organizations to scan large numbers of documents into databases with little or no index information. This may be done for archival purposes, in which case the necessary index may be as simple as a case number, or with the ultimate goal of automatically extracting index information for content-based retrieval. Maintaining the integrity of such a database is diicult, especially in a distributed environment where… CONTINUE READING
Highly Cited
This paper has 78 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 36 extracted citations

78 Citations

Citations per Year
Semantic Scholar estimates that this publication has 78 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-8 of 8 references

Duplicate document image detection

  • D. Doermann, H. Li, O. Kia
  • Technical Report CS-3739,
  • 1997

Duplicate detection in information dissemination

  • T. Yan, H. Garcia-Molina
  • Proceedings of the Very Large Database Conference…
  • 1995
1 Excerpt

Using character shape codes for word spotting in document images. In Shape, Structure and Pattern Recognition, pages 382{389

  • A. L. Spitz
  • World Scienti c, Singapore,
  • 1995
3 Excerpts

Computer algorithms for plagiarism detection

  • A. Parker, J. O Hamblen
  • IEEE Transactions on Education,
  • 1989
1 Excerpt

Similar Papers

Loading similar papers…