Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016

@inproceedings{Asghari2016AlgorithmsAC,
  title={Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016},
  author={Habibollah Asghari and Salar Mohtaj and Omid Fatemi and Heshaam Faili and Paolo Rosso and Martin Potthast},
  booktitle={FIRE},
  year={2016}
}
The task of plagiarism detection is to find passages of text-reuse in a suspicious document. [] Key Result In the first subtask, nine teams participated, whereas the best result achieved was a PlagDet score of 0.92. For the second subtask of corpus construction, five teams submitted a corpus, which were evaluated using the systems submitted for the first subtask. The results show that significant challenges remain in evaluating newly constructed corpora.

A Deep Learning Approach to Persian Plagiarism Detection

TLDR
In this paper, a deep learning based method to detect plagiarism is proposed, words are represented as multi-dimensional vectors, and simple aggregation methods are used to combine the word vectors for sentence representation.

A crowdsourcing approach to construct mono-lingual plagiarism detection corpus

TLDR
HAMTA, a Persian plagiarism detection corpus is proposed and evaluation results indicate a high correlation between the proposed corpus and the PAN state-of-the-art English plagiarism Detection corpus.

Using Local Text Similarity in Pairwise Document Analysis for Monolingual Plagiarism Detection

TLDR
To retrieve plagiarised passages this paper presents a pairwise plagiarism detection algorithm based on a vector space model considering the proximity of the terms and evaluates the performance in terms of precision, recall, granularity and Plagdet metrics.

ParsiPayesh: Persian Plagiarism Detection based on Semantic and Structural Analysis

TLDR
The results indicate that structural and semantic information improves the performance of the proposed method, and the suggestion to examine the semantic similarity of expression is to use the semantic role labeling obtained from the deep learning model presented.

Persian Plagiarism Detection Using Sentence Correlations

TLDR
This report explains the Persian plagiarism detection system which was used to submit its run to Persian PlagDet competition at FIRE 2016 and performance measures on the training corpus were promising.

Academic Plagiarism Detection: A Systematic Literature Review

TLDR
The integration of heterogeneous analysis methods for textual and non-textual content features using machine learning is seen as the most promising area for future research contributions to improve the detection of academic plagiarism further.

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

TLDR
To retrieve plagiarised passages a plagiarism detection method based on vector space model, insensitive to context reordering, is presented and evaluated in terms of precision, recall, granularity and plagdet metrics.

Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase

TLDR
This paper employs text embedding vectors to compare similarity among documents to detect plagiarism and applies the proposed method on available datasets in English, Persian and Arabic languages on the text alignment task to evaluate the robustness of the proposed methods from the language perspective.

Graph-based Approach to Text Alignment for Plagiarism Detection in Persian Documents

This paper presents a new approach for Persian plagiarism detection. This approach uses a graph structure as well as one of the graph similarity methods (iterative methods) for similarity detection

PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

TLDR
A large collection of original and paraphrased sentences from Hamtajoo; a Persian plagiarism detection system, in which users try to conceal cases of text re-use in their documents by paraphrasing and re-submitting manuscripts for analysis.

References

SHOWING 1-10 OF 36 REFERENCES

A Deep Learning Approach to Persian Plagiarism Detection

TLDR
In this paper, a deep learning based method to detect plagiarism is proposed, words are represented as multi-dimensional vectors, and simple aggregation methods are used to combine the word vectors for sentence representation.

Persian Plagiarism Detection Using Sentence Correlations

TLDR
This report explains the Persian plagiarism detection system which was used to submit its run to Persian PlagDet competition at FIRE 2016 and performance measures on the training corpus were promising.

A Pairwise Document Analysis Approach for Monolingual Plagiarism Detection

TLDR
To retrieve plagiarised passages a plagiarism detection method based on vector space model, insensitive to context reordering, is presented and evaluated in terms of precision, recall, granularity and plagdet metrics.

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

TLDR
This paper describes the approach at the PAN@CLEF2013 plagiarism detection competition, and proposes a method based on sentence similarity to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document.

Graph-based Approach to Text Alignment for Plagiarism Detection in Persian Documents

This paper presents a new approach for Persian plagiarism detection. This approach uses a graph structure as well as one of the graph similarity methods (iterative methods) for similarity detection

Overview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection

TLDR
An overview paper describes these evaluation corpora of plagiarism detection methods for Arabic texts, discusses the participants' methods, and highlights their building blocks that could be language dependent.

Developing Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus: Notebook for PAN at CLEF 2015

TLDR
A bilingual Persian-English sentence aligned parallel corpus in a combination with Wikipedia articles is used to create a plagiarism detection corpus based on parallel corpus sentences.

Evaluation of Text Reuse Corpora for Text Alignment Task of plagiarism Detection

TLDR
This paper addresses the text alignment task of 7th International competition on plagiarism detection; PAN 2015 and finds that the most of pla- giarism cases in prepared corporahavea rather high quality in term of "rate of obfuscation" alongside "preserving the concepts".

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

TLDR
The approach for construction of a monolingual Persian plagia- rism corpus that can be used to evaluate the performance of Persian plagiarism detection systems is described.

Developing Monolingual English Corpus for Plagiarism Detection using Human Annotated Paraphrase Corpus

TLDR
An approach to create monolingual English plagiarism detection corpus for the task of text alignment corpus construction in PAN 2015 competition is described and two different obfuscation methods to fragment obfuscation for creating the cases of plagiarism are proposed.