• Corpus ID: 676966

Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors

  title={Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors},
  author={Michael Tschuggnall and G{\"u}nther Specht},
  booktitle={Datenbanksysteme f{\"u}r Business, Technologie und Web},
The task of intrinsic plagiarism detection is to find plagiarized sections within text documents without using a reference corpus. [] Key Method If suspicious sentences are found by computing the pq-gram distance of grammar trees and by utilizing a Gaussian normal distribution, the algorithm tries to select and combine those sentences into potentially plagiarized sections. The parameters and thresholds needed by the algorithm are optimized by using genetic algorithms. Finally, the approach is evaluated…

Using Grammar-Profiles to Intrinsically Expose Plagiarism in Text Documents

A novel approach to intrinsic plagiarism detection is described by analyzing the grammar of authors and using sliding windows to find significant differences in writing styles, which shows very promising results.


This research proposes to use Natural Language Processing (NLP) to create the new way to detect plagiarism and presents the accuracy comparison between Ferret, WCopyFind, and this algorithm.

Countering Plagiarism by Exposing Irregularities in Authors' Grammar

A novel approach in the field of intrinsic plagiarism detection by analyzing syntactic information of authors and finding irregularities in sentence constructions is presented, following the assumption that authors have their mostly unconsciously used set of how to build sentences, which can be utilized to distinguish authors.

Intrinsic Plagiarism Detection and Author Analysis by Utilizing Grammar

With the advent of the world wide web the number of freely available text documents has increased considerably in the last years. As one of the immediate results, it has become easier to find sources

A Survey of Plagiarism Detection Strategies and Methodologies in Text Document

  • Computer Science
  • 2015
A detail survey of earlier plagiarism techniques and some of the recent techniques is presented and semantic analysis of the sentence helps to find such plagiarized sentences.

Plagiarism Detection Using Artificial Intelligence Technique In Multiple Files

A new method is proposed, implemented in a program, where a text set is utilised to identify the copied part by comparing with some existing multiple files and the concept of a machine learning language i.e k-NN is put.

Plagiarism Detection Tools for Scientific e-Journals Publishing

There are presented recommendations of tools that meet the needs of scholars and can be used by the editors of scientific journals according to defined software specifications, productivity of functioning and obtained results during the checking.

Plagiarism Detection through Data Mining Techniques

The data mining techniques will be used to increase the efficiency of detection of plagiarism and improve the reliability of the operation.

Plagiarism detection on electronic text based assignments using vector space model

This tool could be used as an effective tool to evaluate text based electronic assignments and minimize the plagiarism among students and cosine similarity measure using trigram technique is more preferable than the other.

Academic Plagiarism Detection

The integration of heterogeneous analysis methods for textual and non-textual content features using machine learning is seen as the most promising area for future research contributions to improve the detection of academic plagiarism further.



Plag-Inn: Intrinsic Plagiarism Detection Using Grammar Trees

A novel approach to plagiarism detection is described by processing and analyzing the grammar of a suspicious document by split a text into single sentences and to calculate grammar trees.

Intrinsic Plagiarism Detection Using Character n-gram Profiles

A new method is presented that attempts to quantify the style variation within a document using character n-gram profiles and a style change function based on an appropriate dissimilarity measure originally proposed for author identification.

Using Syntactic Information to Identify Plagiarism

A set of low-level syntactic structures that capture creative aspects of writing are presented and it is shown that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidF- Weighted keywords.

Intrinsic Plagiarism Detection Using Character Trigram Distance Scores - Notebook for PAN at CLEF 2011

An algorithm for outlier detection in multivariate data (based on Principal Components Analysis) is applied to the distance matrix in or- der to detect plagiarized sections.

Putting Ourselves in SME’s Shoes: Automatic Detection of Plagiarism by the WCopyFind tool

Thanks in part, to the large amount of information circulating today on the Internet, unfortunately, the plagiarism has become a very common practice, up to become one of the biggest problems of

Intrinsic Plagiarism Detection using Complexity Analysis

Kolmogorov Complexity measures are introduced as a way of extracting structural information from texts for Intrinsic Plagiarism Detection and more sophisticated compression algorithms which are suited to com- pressing the English language show great promise for feature extraction for various text classification problems.

Automatic Text Categorization in Terms of Genre and Author

This paper proposes a set of style markers including analysis-level measures that represent the way in which the input text has been analyzed and capture useful stylistic information without additional cost to take full advantage of existing natural language processing (NLP) tools.

FastDocode: Finding Approximated Segments of N-Grams for Document Copy Detection - Lab Report for PAN at CLEF 2010

Results in a learning dataset of plagiarized documents from the PAN'09, and its further evaluation in the PAN’10 plagiarism detection challenge, showed that the trade-off between speed and performance could improve other plagiarism Detection algorithms.

Semantic Sequence Kin: A Method of Document Copy Detection

The Semantic Sequence Kin (SSK) is tested and it is shown that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.

External Plagiarism Detection Based on Standard IR Technology and Fast Recognition of Common Subsequences - Lab Report for PAN at CLEF 2010

The plagiarism detection system described in this paper is aiming to incorporate standard IR technologies for the candidate selection and efficient data structures for the detailed analysis between a suspicious and a candidate document.