Scalable Document Fingerprinting ( Extended Abstract )
@inproceedings{Nevin1996ScalableDF, title={Scalable Document Fingerprinting ( Extended Abstract )}, author={Nevin and HeintzeBell}, year={1996} }
As more information becomes available electronically, document search based on textual similarity is becoming increasingly important, not only for locating documents online, but also for addressing internet variants of old problems such as plagiarism and copyright violation. This paper presents an online system that provides reliable search results using modest resources and scales up to data sets of the order of a million documents. Our system provides a practical compromise between storage… CONTINUE READING
21 Citations
Methods for Identifying Versioned and Plagiarized Documents
- Computer Science
- J. Assoc. Inf. Sci. Technol.
- 2003
- 370
- Highly Influenced
- PDF
A Sentence-Based Copy Detection Approach for Web Documents
- Computer Science
- FSKD
- 2005
- 62
- Highly Influenced
A Dual-Method Model for Copy Detection
- Computer Science
- 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops
- 2006
- 8
An efficient method to detect duplicates of web documents with the use of inverted index
- Computer Science
- WWW 2002
- 2002
- 25
- PDF
Detecting similar HTML documents using a fuzzy set information retrieval approach
- Computer Science
- 2005 IEEE International Conference on Granular Computing
- 2005
- 24
References
SHOWING 1-10 OF 18 REFERENCES
Electronic marking and identification techniques to discourage document copying
- Computer Science
- Proceedings of INFOCOM '94 Conference on Computer Communications
- 1994
- 438
- PDF
Plagiarism in the web", Ed- itorial
- Communications of the ACM,
- 1995
Marking and Identiication Techniques to Discourage Document Copying
- Journal on Selected Areas in Communications
- 1995
\Finding similar les in a large le system
- Proceedings of the 1994 USENIX Conference
- 1994