Intrinsic plagiarism analysis

@article{Stein2011IntrinsicPA,
  title={Intrinsic plagiarism analysis},
  author={Benno Stein and Nedim Lipka and P. Prettenhofer},
  journal={Language Resources and Evaluation},
  year={2011},
  volume={45},
  pages={63-82}
}
Research in automatic text plagiarism detection focuses on algorithms that compare suspicious documents against a collection of reference documents. Recent approaches perform well in identifying copied or modified foreign sections, but they assume a closed world where a reference collection is given. This article investigates the question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available… Expand
On the use of character n-grams as the only intrinsic evidence of plagiarism
TLDR
It is demonstrated empirically that the low- and the high-frequency n-grams are not equally relevant for intrinsic plagiarism detection, but their performance depends on the way they are exploited. Expand
Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style
TLDR
Text mining is done, exploring the use of words as a linguistic feature for analyzing a document by modeling the writing style present in it, and it is demonstrated that this feature shows promise in this area, achieving reasonable results compared to benchmark models. Expand
An efficient classification approach in imbalanced datasets for intrinsic plagiarism detection
TLDR
This work considers, for the first time, the fact of imbalanced data as a crucial parameter of the problem and experiment with various balancing techniques, and combines features and imbalanced dataset treatment with various classification methods. Expand
A Detection Method for Plagiarism Reports of Students
TLDR
This paper deals with plagiarism in student reports, and proposes a method to detect them efficiently and accurately, based on the way of making two texts to be compared with into one-dimension string respectively, repeating shift and mutual comparison, and checking the matching section of words. Expand
Intrinsic Plagiarism Detection and Author Analysis by Utilizing Grammar
With the advent of the world wide web the number of freely available text documents has increased considerably in the last years. As one of the immediate results, it has become easier to find sourcesExpand
Citation-based plagiarism detection - idea, implementation and evalutation
  • Bela Gipp
  • Computer Science
  • Bull. IEEE Tech. Comm. Digit. Libr.
  • 2012
TLDR
Citation-based Plagiarism Detection is by no means a replacement for the currently used text-based approaches, but should be considered as a complement for identifying currently hard to find well-disguised plagiarisms. Expand
An Improved Topic Masking Technique for Authorship Analysis
TLDR
POSNoise is able to outperform a well-known topic masking approach in 51 out of 64 cases with up to 12.5% improvement in terms of accuracy and it is shown that for corpora preprocessed with POSNoise, the AV methods examined often achieve higher accuracies compared to the original corpora. Expand
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
TLDR
It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Expand
Plagiarism Detection
Plagiarism is certainly a problem in today’s world, and it probably has been ever since writing was invented. Developing an effective, automated tool for detecting plagiarism is both practicallyExpand
Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods
TLDR
A new taxonomy of plagiarism is presented that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view, and supports deep understanding of different linguistic patterns in committing plagiarism. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 78 REFERENCES
Intrinsic Plagiarism Detection
TLDR
It is shown that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style, and new features for the quantification of style aspects are added. Expand
Intrinsic Plagiarism Analysis with Meta Learning
TLDR
A hybrid approach that employs style marker analysis for the purpose of hypotheses generation which then are accepted or rejected by an authorship verification analysis is proposed and the evaluation of style markers for German text and their application to a real-world plagiarism case is evaluated. Expand
Meta Analysis within Authorship Verification
TLDR
This paper introduces authorship verification problems as decision problems, discusses possibilities for the use of meta knowledge, and applies meta analysis to post- process unreliable style analysis results. Expand
Plagiarism Detection Without Reference Collections
Current research in the field of automatic plagiarism detection for text documents focuses on the development of algorithms that compare suspicious documents against potential original documents.Expand
A survey of modern authorship attribution methods
TLDR
A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification. Expand
Using Conjunctions and Adverbs for Author Verification
TLDR
This work proposes a stylometric feature set based on conjunctions and ad- verbs of the Portuguese language to address the problem of author identification and demonstrates that the proposed strategy can produced results comparable to the literature. Expand
Computer-Based Authorship Attribution Without Lexical Measures
TLDR
This paper presents a fully-automated approach to the identification of the authorship of unrestricted text that excludes any lexical measure and adapts aset of style markers to the analysis of the text performed by an already existing natural language processing tool using three stylometric levels. Expand
Authorship Attribution
  • P. Juola
  • Psychology, Computer Science
  • Found. Trends Inf. Retr.
  • 2006
TLDR
This review shows that the authorship attribution discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices. Expand
Authorship Attribution of Texts: A Review
TLDR
Several theoretical approaches are surveyed including ones approximating the apparently nearly optimal one based on Kolmogorov conditional complexity and some case studies: attributing Shakespeare canon and newly discovered works as well as allegedly M. Twain's newly-discovered works, Federalist papers binary (Madison vs. Hamilton) discrimination using Naive Bayes and other classifiers, and steganography presence testing. Expand
Methods for Identifying Versioned and Plagiarized Documents
TLDR
The identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents, and it is demonstrated that the identity measure is clearly superior for fingerprinting parameters. Expand
...
1
2
3
4
5
...