Strategies for retrieving plagiarized documents


For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using the entire Wikipedia corpus we compile n-gram indexes and compare them to a new kind of fingerprint index in a plagiarism analysis use case. Our index provides an analysis speed-up by factor… (More)
DOI: 10.1145/1277741.1277928


4 Figures and Tables

