On the use of character n-grams as the only intrinsic evidence of plagiarism

@article{Bensalem2019OnTU,
  title={On the use of character n-grams as the only intrinsic evidence of plagiarism},
  author={Imene Bensalem and P. Rosso and S. Chikhi},
  journal={Language Resources and Evaluation},
  year={2019},
  pages={1-34}
}
AbstractWhen a shift in writing style is noticed in a document, doubts arise about its originality. Based on this clue to plagiarism, the intrinsic approach to plagiarism detection identifies the stolen passages by analysing the writing style of the suspicious document without comparing it to textual resources that may serve as sources for the plagiarist. Character n-grams are recognised as a successful approach to modelling text for writing style analysis. Although prior studies have… Expand
Intrinsic Plagiarism Detection System Using Stylometric Features and DBSCAN
Plagiarism is the act of using someone else’s words or ideas without giving them due credit and representing it as one’s own work. In today's world, it is very easy to plagiarize others' work due toExpand
Hybrid plagiarism detection method for French language
With the growth of the content found throughout the Web, every information can be plagiarized. Plagiarism is the process of using the ideas of another without naming the source. Consequently,Expand
Paraphrase type identification for plagiarism detection using contexts and word embeddings
Paraphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion have been identifiedExpand
Text Borrowings Detection System for Natural Language Structured Digital Documents
TLDR
In article method comparison of structured document is developed for comparison digital structured natural language documents and the features of the system and its advantages are presented. Expand
Advanced Models for Stylometric Applications
Applications to Political Speeches
Author Profiling of Tweets
Basic Lexical Concepts and Measurements
Distance-Based Approaches
Elena Ferrante: A Case Study in Authorship Attribution
...
1
2
...

References

SHOWING 1-10 OF 60 REFERENCES
Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style
TLDR
Text mining is done, exploring the use of words as a linguistic feature for analyzing a document by modeling the writing style present in it, and it is demonstrated that this feature shows promise in this area, achieving reasonable results compared to benchmark models. Expand
Intrinsic Plagiarism Detection Using Character n-gram Profiles
TLDR
A new method is presented that attempts to quantify the style variation within a document using character n-gram profiles and a style change function based on an appropriate dissimilarity measure originally proposed for author identification. Expand
On the Robustness of Authorship Attribution Based on Character N -gram Features
TLDR
Comparative results with another competitive text representation approach based on very frequent words show that character n-grams are better able to capture stylistic properties of text when there are significant differences among the training and test corpora. Expand
Intrinsic plagiarism analysis
TLDR
The question whether plagiarism can be detected by a computer program if no reference can be provided, e.g., if the foreign sections stem from a book that is not available in digital form is investigated. Expand
Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection
TLDR
This paper investigates and improves performance of character n-grams profiles method proposed by Stamatatos by tuning its parameter settings and proposing new modifications and rich feature sets and raised the overall plagdet score. Expand
Author Verification Using Common N-Gram Profiles of Text Documents
TLDR
This work proposes a proximity based method for one-class classification that applies the Common N-Gram (CNG) dissimilarity measure, and utilizes the pairs of most dissimilar documents among documents of known authorship. Expand
N-Gram Feature Selection for Authorship Identification
TLDR
This paper proposes a variable-length n-gram approach inspired by previous work for selectingVariable-length word sequences and explores the significance of digits for distinguishing between authors showing that an increase in performance can be achieved using simple text pre-processing. Expand
Intrinsic Plagiarism Detection using N-gram Classes
TLDR
A novel languageindependent intrinsic plagiarism detection method which is based on a new text representation that is called n-gram classes is introduced which is comparable to the best state-of-the-art methods. Expand
Not All Character N-grams Are Created Equal: A Study in Authorship Attribution
TLDR
It is demonstrated that characterngrams that capture information about affixes and punctuation account for almost all of the power of character n-grams as features. Expand
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
TLDR
It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Expand
...
1
2
3
4
5
...