Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence

@inproceedings{Gipp2011CitationPM,
  title={Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence},
  author={Bela Gipp and Norman Meuschke},
  booktitle={DocEng '11},
  year={2011}
}
Plagiarism Detection Systems have been developed to locate instances of plagiarism e.g. within scientific papers. [...] Key Method They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that if these algorithms are combined, common forms of plagiarism can be detected reliably.Expand
Citation-based plagiarism detection - idea, implementation and evalutation
  • Bela Gipp
  • Computer Science
  • Bull. IEEE Tech. Comm. Digit. Libr.
  • 2012
TLDR
Citation-based Plagiarism Detection is by no means a replacement for the currently used text-based approaches, but should be considered as a complement for identifying currently hard to find well-disguised plagiarisms. Expand
CitePlag : A Citation-based Plagiarism Detection System Prototype
TLDR
An open-source prototype of a citation-based plagiarism detection system called CitePlag, to evaluate the citations of academic documents as language independent markers to detect plagiarism, is presented. Expand
Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space
TLDR
It is shown that a hybrid approach that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms allows semantic plagiarism detection to become feasible even on large collections for the first time. Expand
Counting Co-occurrences in Citations to Identify Plagiarised Text Fragments
TLDR
The value of identifying co-occurrences in citations is assessed by checking whether this method can identify cases of plagiarism in a dataset of scientific papers and showing that most the cases in which co-Occurrences were found indeed correspond to plagiarised passages. Expand
Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus
TLDR
Evaluation of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles shows that the citation‐based approach achieves superior ranking performance for heavily disguised plagiarism forms and is demonstrated to be computationally more efficient than character‐based approaches. Expand
Comparing and combining Content‐ and Citation‐based approaches for plagiarism detection
TLDR
This work compares content and citation‐based approaches for plagiarism detection with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection and concluded that a combination of the methods can be beneficial. Expand
Web-based Demonstration of Semantic Similarity Detection Using Citation Pattern Visualization for a Cross Language Plagiarism Case
TLDR
A fully functional, web-based visualization of citation patterns for this verified cross-language plagiarism case, allowing the user to interactively experience the benefits of citation pattern analysis for plagiarism detection. Expand
Integrating syntax‐semantic‐based text analysis with structural and citation information for scientific plagiarism detection
TLDR
The proposed plagiarism detection system employs the effective coupling of various modules, namely, logical structure classifications and citation parsing, two‐stage candidate document selections, syntax‐semantic‐based exhaustive passage level analysis with plagiarism analysis using structural and citation information. Expand
On the development of a plagiarism detection model based on citation analysis using a bibliographic database
TLDR
The step-by-step algorithm for addressing a query to the Web of Science and Scopus databases and analysis of the obtained data can be recommended for implementation into systems of plagiarism detection as an additional component. Expand
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations
TLDR
Overall, it is shown that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 59 REFERENCES
Citation based plagiarism detection: a new approach to identify plagiarized work language independently
TLDR
This approach is based on citation analysis and allows duplicate and plagiarism detection even if a document has been paraphrased or translated, since the relative position of citations remains similar. Expand
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
TLDR
It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Expand
Intrinsic plagiarism analysis
TLDR
The question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form is investigated. Expand
Intrinsic Plagiarism Detection
TLDR
It is shown that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style, and new features for the quantification of style aspects are added. Expand
Identifying free text plagiarism based on semantic similarity
It is common knowledge that plagiarism in academia goes as back in time as research itself. However, in the last two decades this phenomenon of academic deception has turned into an academic plague.Expand
Déjà vu: a database of highly similar citations in the scientific literature
TLDR
Déjà vu is made available, a publicly available database of highly similar Medline citations identified by the text similarity search engine eTBLAST, which helps authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. Expand
Déjà vu - A study of duplicate citations in Medline
TLDR
Using text similarity searches, a database of manually verified duplicate citations was created to study author publication behavior and found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism. Expand
Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar
TLDR
Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals. Expand
Signature Extraction for Overlap Detection in Documents
TLDR
A web-accessible text registry based on signature extraction that extracts a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures is described. Expand
Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar
TLDR
Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals. Expand
...
1
2
3
4
5
...