Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag

@inproceedings{Gipp2011ComparativeEO,
  title={Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag},
  author={Bela Gipp and Norman Meuschke and J. Beel},
  booktitle={JCDL '11},
  year={2011}
}
Various approaches for plagiarism detection exist. All are based on more or less sophisticated text analysis methods such as string matching, fingerprinting or style comparison. In this paper a new approach called Citation-based Plagiarism Detection is evaluated using a doctoral thesis, in which a volunteer crowd-sourcing project called GuttenPlag identified substantial amounts of plagiarism through careful manual inspection. This new approach is able to identify similar and plagiarized… Expand
Citation-based plagiarism detection - idea, implementation and evalutation
  • Bela Gipp
  • Computer Science
  • Bull. IEEE Tech. Comm. Digit. Libr.
  • 2012
TLDR
Citation-based Plagiarism Detection is by no means a replacement for the currently used text-based approaches, but should be considered as a complement for identifying currently hard to find well-disguised plagiarisms. Expand
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence
TLDR
Three algorithms are introduced and it is shown that if these algorithms are combined, common forms of plagiarism can be detected reliably and Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence are combined. Expand
Hybrid technique for plagiarism detection based on text and citation comparison
Plagiarism is a “stealing of academic assets”. In earlier days, numerous text documents are accessible on the web and that are effortless to have an access of it. Appropriate to this largeExpand
Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space
TLDR
It is shown that a hybrid approach that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms allows semantic plagiarism detection to become feasible even on large collections for the first time. Expand
State-of-the-art in detecting academic plagiarism
TLDR
In the future, plagiarism detection systems may benefit from combining traditional character-based detection methods with these emerging detection approaches, including intrinsic, cross-lingual and citation-based plagiarism Detection. Expand
CitePlag : A Citation-based Plagiarism Detection System Prototype
TLDR
An open-source prototype of a citation-based plagiarism detection system called CitePlag, to evaluate the citations of academic documents as language independent markers to detect plagiarism, is presented. Expand
On the development of a plagiarism detection model based on citation analysis using a bibliographic database
TLDR
The step-by-step algorithm for addressing a query to the Web of Science and Scopus databases and analysis of the obtained data can be recommended for implementation into systems of plagiarism detection as an additional component. Expand
Demonstration of citation pattern analysis for plagiarism detection
TLDR
State-of-the-art plagiarism detection approaches capably identify copy & paste and to some extent slightly modified plagiarism but cannot reliably identify strongly disguised plagiarism forms, including paraphrases, translated plagiarism, and idea plagiarism. Expand
Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus
TLDR
Evaluation of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles shows that the citation‐based approach achieves superior ranking performance for heavily disguised plagiarism forms and is demonstrated to be computationally more efficient than character‐based approaches. Expand
On the use of character n-grams as the only intrinsic evidence of plagiarism
TLDR
It is demonstrated empirically that the low- and the high-frequency n-grams are not equally relevant for intrinsic plagiarism detection, but their performance depends on the way they are exploited. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 27 REFERENCES
Citation based plagiarism detection: a new approach to identify plagiarized work language independently
TLDR
This approach is based on citation analysis and allows duplicate and plagiarism detection even if a document has been paraphrased or translated, since the relative position of citations remains similar. Expand
Intrinsic plagiarism analysis
TLDR
The question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form is investigated. Expand
An Evaluation Framework for Plagiarism Detection
TLDR
Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale. Expand
Detecting Short Passages of Similar Text in Large Document Collections
TLDR
The method exploits the characteristic distribution of word trigrams, and measures to determine similarity are based on set theoretic principles, and has been successfully used to detect plagiarism in students’ work. Expand
Plagiarism - A Survey
TLDR
This paper discusses the complex general setting, then reports on some results of plagiarism detection software, and draws attention to the fact that any serious investigation in plagiarism turns up rather unexpected side-effects. Expand
Methods for Identifying Versioned and Plagiarized Documents
TLDR
The identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents, and it is demonstrated that the identity measure is clearly superior for fingerprinting parameters. Expand
Test cases for plagiarism detection software
TLDR
A typology of plagiarism, which makes clear that plagiarism is more than just an exact copy, is discussed, and a collection of 42 test cases in German are presented that were developed at the HTW Berlin for testing plagiarism detection software. Expand
Citation indexes for science. A new dimension in documentation through association of ideas. 1955.
  • E. Garfield
  • Sociology, Medicine
  • International journal of epidemiology
  • 2006
‘The uncritical citation of disputed data by a writer, whether it be deliberate or not, is a serious matter. Of course, knowingly propagandizing unsubstantiated claims is particularly abhorrent, butExpand
Citation indexes for science; a new dimension in documentation through association of ideas.
TLDR
The uncritical citation of disputed data by a writer, whether it be deliberate or not, is a serious matter and critical notes are increasingly likely to be overlooked with the passage of time. Expand
Plagiarism in natural and programming languages: an overview of current tools and technologies
TLDR
Techniques for detecting both plagiarism in natural and programming languages are discussed in this report to provide the reader with a comprehensive introduction to this area. Expand
...
1
2
3
...