Strategies for retrieving plagiarized documents

Abstract

For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using the entire Wikipedia corpus we compile n-gram indexes and compare them to a new kind of fingerprint index in a plagiarism analysis use case. Our index provides an analysis speed-up by factor… (More)
DOI: 10.1145/1277741.1277928

Topics

4 Figures and Tables

Slides referencing similar topics