• Corpus ID: 6580894

CitePlag : A Citation-based Plagiarism Detection System Prototype

  title={CitePlag : A Citation-based Plagiarism Detection System Prototype},
  author={Norman Meuschke and Bela Gipp and Corinna Breitinger},
This paper presents an open-source prototype of a citation-based plagiarism detection system called CitePlag. [] Key Method The algorithms consider multiple citation-related factors such as proximity and order of citations within the text, or their probability of co-occurrence in order to compute document similarity scores. We present technical details of CitePlag’s detection algorithms and the acquisition of test data from the PubMed Central Open Access Subset. Future advancement of the prototype lies in…

Figures and Tables from this paper

Comparing and combining Content‐ and Citation‐based approaches for plagiarism detection

This work compares content and citation‐based approaches for plagiarism detection with the goal of evaluating whether they are complementary and if their combination can improve the quality of the detection and concluded that a combination of the methods can be beneficial.

Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

Haitajoo, a Persian plagiarism detection system for academic manuscripts is introduced and the overall structure of the system along with the algorithms used in each stage are described.

Integrating syntax‐semantic‐based text analysis with structural and citation information for scientific plagiarism detection

The proposed plagiarism detection system employs the effective coupling of various modules, namely, logical structure classifications and citation parsing, two‐stage candidate document selections, syntax‐semantic‐based exhaustive passage level analysis with plagiarism analysis using structural and citation information.

NeoPlag: An Ecosystem to Support the Development and Evaluation of New Algorithms to Detect Plagiarism

A novel ecosystem to provide support during the development process of new algorithms to detect plagiarism, test the existing algorithms or perform benchmarking analysis, and developed and uploaded into system a basic detection algorithm based on vector space model.

State-of-the-art in detecting academic plagiarism

In the future, plagiarism detection systems may benefit from combining traditional character-based detection methods with these emerging detection approaches, including intrinsic, cross-lingual and citation-based plagiarism Detection.

Text Mining for Plagiarism Detection: Multivariate Pattern Detection for Recognition of Text Similarities

A text mining methodology is proposed that can detect all common patterns between a document and the documents in a reference database and has been applied in a well-defined dataset providing very promising results identifying difficult cases of plagiarism such as technical disguise.

Citation-based Plagiarism Detection

  • Bela Gipp
  • Physics
    Springer Fachmedien Wiesbaden
  • 2014
When the author first considered the use of citation information as a method to detect plagiarism, he assumed this concept had already been explored or even integrated into today’s plagiarism

Survey of Plagiarism Detection Approaches and Big data Techniques related to Plagiarism Candidate Retrieval

An overview of the best-known methods of detection of plagiarism that exist is given and the concept of big data is defined as one of these techniques that applied in the phase of extraction of documents sources for plagiarism detection.

Visualizing Feature-based Similarity for Research Paper Recommendation

Results from a study with 10 expert users show that the interactive visualization interface proposed can effectively address specialized information retrieval tasks, which cannot be addressed by existing research paper search or recommendation interfaces.

An academic Arabic corpus for plagiarism detection: design, construction and experimentation

  • Eman Al-ThwaibBassam H. HammoSane Yagi
  • Computer Science
    International Journal of Educational Technology in Higher Education
  • 2020
The design and construction of an Arabic PD reference corpus that is dedicated to academic language and a database for the detection of plagiarism in student assignments, reports, and dissertations is discussed.



Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence

Three algorithms are introduced and it is shown that if these algorithms are combined, common forms of plagiarism can be detected reliably and Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence are combined.

Citation based plagiarism detection: a new approach to identify plagiarized work language independently

This approach is based on citation analysis and allows duplicate and plagiarism detection even if a document has been paraphrased or translated, since the relative position of citations remains similar.

Citation Proximity Analysis (CPA) : A New Approach for Identifying Related Work Based on Co-Citation Analysis

The approach called Citation Proximity Analysis (CPA) is a further development of co-citation analysis, but in addition, considers the proximity of citations to each other within an article’s full-text.

Systematic Characterizations of Text Similarity in Full Text Biomedical Publications

While quantifying abstract similarity is an effective approach for finding duplicate citations, a comprehensive full text analysis is necessary to uncover all potential duplicate citations in the scientific literature and is helpful when establishing ethical guidelines for scientific publications.

SPLAT: A System for Self-Plagiarism Detection

This paper presents a system for self-plagiarism detection, SPLAT. The system uses a WebL web spider that crawls through the web sites of the top fifty Computer Science departments, downloading

Sentence boundary detection: a comparison of paradigms for improving MT quality

A comparison of different paradigms for the detection of sentence boundaries in written text is presented: Directly encoding the knowledge in a program, a rule-based system relying on regular expressions to describe boundaries, and a statistical maximum-entropy learning algorithm to obtain knowledge about boundaries.

Plagiarism analysis, authorship identification, and near-duplicate detection PAN'07

Goal of the workshop was to bring together experts and prospective researchers around the exciting and future-oriented topic of plagiarism analysis, authorship identification, and high similarity

Test cases for plagiarism detection software

A typology of plagiarism, which makes clear that plagiarism is more than just an exact copy, is discussed, and a collection of 42 test cases in German are presented that were developed at the HTW Berlin for testing plagiarism detection software.

dTagger: A POS Tagger

The Lexical Systems Group at the National Library of Medicine (NLM) has developed a Part-of-Speech (POS) tagger to be freely distributed with the SPECIALIST NLP Tools. dTagger is specifically

Automatically Adapting an NLP Core Engine to the Biology Domain

In the first evaluation ever of a ML-based ensemble of core NLP components in the biology domain, it is demonstrated that the performance of OpenNLP’s sentence splitter, tokenizer, part- of-speech tagger, chunker and parser matches up with state-of-the-art performance figures from the newspaper domain.