Document overlap detection system for distributed digital libraries

@inproceedings{Monostori2000DocumentOD,
  title={Document overlap detection system for distributed digital libraries},
  author={K. Monostori and A. Zaslavsky and H. Schmidt},
  booktitle={DL '00},
  year={2000}
}
In this paper we introduce the MatchDetectReveal(MDR) system, which is capable of identifying overlapping and plagiarised documents. Each component of the system is briefly described. The matching-engine component uses a modified suffix tree representation, which is able to identify the exact overlapping chunks and its performance is also presented. 
Signature Extraction for Overlap Detection in Documents
TLDR
A web-accessible text registry based on signature extraction that extracts a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures is described. Expand
Partial Plagiarism Detection Using String Matching with Mismatches
TLDR
This work proposes the method that detects partial copies from documents without query, and some partial copies were detected from test documents. Expand
PPChecker: Plagiarism Pattern Checker in Document Copy Detection
TLDR
Experiments performed on CISI document collection show that PPChecker produces better decision information for document copy detection than existing systems. Expand
Evaluation of Document Comparing Mechanisms
Digital libraries have made access to documents very easy but this also makes documents vulnerable to being copied. The illegal distribution of documents discourages authors/ news feed services toExpand
Efficient Computational Approach to Identifying Overlapping Documents in Large Digital Collections
TLDR
A new data structure is proposed, which has the same versatility as a suffix tree but requires less space than any other representation known to date and is called a suffix vector because of the way it is organised in memory. Expand
Near Similarity Search and Plagiarism Analysis
TLDR
This work states that an overlap of two documents’ fingerprints indicate a possibly plagiarized text passage, and uses MD5 hashes to construct fingerprints to identify matching passages. Expand
A Survey on Natural Language Text Copy Detection
TLDR
A comprehensive survey on natural language text copy detection is given, the developments of copy detection are introduced, and some key detection techniques are listed and compared with each other. Expand
A Copy Detection Method Based on SCAM and PPCHECKER
TLDR
A schema for detecting copies including partial copies is proposed based on SCAM and PPCHECKER methods, that benefits advantages of both methods. Expand
An improved plagiarism detection scheme based on semantic role labeling
TLDR
The method significantly outperforms the modern methods for plagiarism detection in terms of Recall, Precision and F-measure and weighting for each argument generated by SRL to study its behaviour is introduced. Expand
BAENPD: A Bilingual Plagiarism Detector
TLDR
A bilingual plagiarism detection system, BAENPD that can detect plagiarism from electronic Bangla and English documents and it is found that the system can efficiently detection plagiarism between English and Bangla documents as well as from the documents of same language. Expand
...
1
2
3
4
5
...

References

SHOWING 1-5 OF 5 REFERENCES
Parallel Overlap and Similarity Detection in Semi- Structured Document Collections
TLDR
The problems of using par- allel and cluster computing systems for detecting plagiarism in large collections of semi-structured electronic texts, including software written in formal lan- guages at one end of the spectrum and natural language texts at the other end are discussed. Expand
SCAM: A Copy Detection Mechanism for Digital Documents
TLDR
A new scheme for detecting copies based on comparing the word frequency occurrences of the new document against those of registered documents, and an experimental comparison between this scheme and COPS, a detection scheme based on sentence overlap is reported on. Expand
Algorithms on strings, trees, and sequences
TLDR
Ukkonen’s method is the method of choice for most problems requiring the construction of a suffix tree, and it will be presented first because it is easier to understand. Expand
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology
TLDR
The author examines the importance of (sub)sequence comparison in molecular biology, core string edits, alignments and dynamic programming, and a deeper look at classical methods for exact string matching. Expand
Plagiarism.org, the Internet plagiarism detection service URL http://www.plagiarism.org
  • Plagiarism.org, the Internet plagiarism detection service URL http://www.plagiarism.org
  • 1999