Corpus ID: 15922022

A Deep Learning Approach to Persian Plagiarism Detection

@inproceedings{Gharavi2016ADL,
  title={A Deep Learning Approach to Persian Plagiarism Detection},
  author={Erfaneh Gharavi and Kayvan Bijari and Kiarash Zahirnia and Hadi Veisi},
  booktitle={FIRE},
  year={2016}
}
Plagiarism detection is defined as automatic identification of reused text materials. [...] Key Method In the proposed method, words are represented as multi-dimensional vectors, and simple aggregation methods are used to combine the word vectors for sentence representation. By comparing representations of source and suspicious sentences, pair sentences with the highest similarity are considered as the candidates for plagiarism. The decision on being plagiarism is performed using a two level evaluation method…Expand
A deep learning based technique for plagiarism detection: a comparative study
TLDR
A comparative study based on a set of criterions like: Vector representation method, Level Treatment, Similarity Method and Dataset to give an overview of different propositions for plagiarism detection based on the deep learning algorithms. Expand
Deep Learning Based Technique for Plagiarism Detection in Arabic Texts
TLDR
The similarity measures show how simple changes in text such as changing one word, or changing the position of verbs and nouns results with similarity value equal to 99% which provide the possibility to detect plagiarism even if the test is altered by replacing words by their synonyms orChanging the words order. Expand
Deep Learning Approach to Detect Plagiarism in Sinhala Text
TLDR
A word embedding model is built using a Deep learning neural network and a Sinhala text corpus and found to be capable of detecting plagiarism with an accuracy of 97%. Expand
Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016
TLDR
The Persian PlagDet shared task at PAN 2016 was organized in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. Expand
A Novel Framework for Plagiarism Detection: A Case Study for Urdu Language
TLDR
A novel framework for plagiarism detection specifically for Urdu language is proposed and a corpus of Urdu text is developed to measure the similarity between suspicious and source text to produce significant improvement in the performance of plagiarism Detection compared with existing methods. Expand
Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase
TLDR
This paper employs text embedding vectors to compare similarity among documents to detect plagiarism and applies the proposed method on available datasets in English, Persian and Arabic languages on the text alignment task to evaluate the robustness of the proposed methods from the language perspective. Expand
Detecting Similarity in Paraphrased Persian Texts using Semantic and Probabilistic Methods
TLDR
A semantic algorithm that employs a dictionary to detect paraphrased sentences and a probabilistic algorithm that uses the statistical information obtained from a large corpus of Persian texts to detect similar texts, which is the first probabilists text alignment algorithm proposed for the Persian language. Expand
ParsiPayesh: Persian Plagiarism Detection based on Semantic and Structural Analysis
In recent years, the rapid increase of Persian electronic resources and facility of access to them has seriously triggered the plagiarism problem of the Iranian scientific community. Despite theExpand
Corpus-Based Paraphrase Detection Experiments and Review
TLDR
A performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection shows that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed. Expand
Improving The Detection of Plagiarism in Scientific Articles Using Machine Learning Approaches
TLDR
The purpose of this study is to protect intellectual property and ideas, as well as the results to improve better performance and level of accuracy in detecting plagiarism. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 46 REFERENCES
Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016
TLDR
The Persian PlagDet shared task at PAN 2016 was organized in an effort to promote the comparative assessment of NLP techniques for plagiarism detection with a special focus on plagiarism that appears in a Persian text corpus. Expand
Automatic external Persian plagiarism detection using vector space model
TLDR
An external Persian plagiarism detection method based on the vector space model (VSM) has been proposed and a Persian corpus has been developed to implement and examine this method. Expand
Dynamically Adjustable Approach through Obfuscation Type Recognition
TLDR
This work describes the approach to the text alignment subtask of the plagiarism detection competition at PAN 2015 and improves significantly the performance regarding the previous PAN 2014 approach and hence, this approach outperforms the best-performing system of the PAN 2014. Expand
Old and new challenges in automatic plagiarism detection
TLDR
The nature of the plagiarism problem is explored, and the approaches used so far for its detection are summarized, and a number of methods used to measure text reuse are discussed. Expand
A Hybrid Architecture for Plagiarism Detection
TLDR
A hybrid plagiarism detection architecture that operates on the two principal forms of text plagiarism, such as paraphrasing and modified cut-and-paste, that contains a text alignment component that is robust against word choice and phrasing changes that do not alter the basic ordering. Expand
Plagiarism Detection using ROUGE and WordNet
TLDR
This study proposes adoption of ROUGE and WordNet to plagiarism detection and includes ngram co-occurrence statistics, skip-bigram, and longest common subsequence (LCS), while the latter acts as a thesaurus and provides semantic information. Expand
Three Way Search Engine Queries with Multi-feature Document Comparison for Plagiarism Detection
TLDR
The proposed methodology was the best performing one in case of long term operation and also the most cost-effective one at the PAN 2012 plagiarism detection competition. Expand
Plagiarism Detection through Multilevel Text Comparison
  • M. Zini, M. Fabbri, M. Moneglia, A. Panunzi
  • Computer Science
  • 2006 Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS'06)
  • 2006
TLDR
A recursive plagiarism evaluation function to be evaluated at each level of the document structure which is based on the Levenshtein edit distance is proposed and a method that will eliminate unnecessary chunks comparison, avoiding similarity calculation of chunks which do not share enough 4-grams is proposed. Expand
Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection
TLDR
The P4P corpus is created, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection, providing critical insights for the improvement of automatic plagiarisms detection systems. Expand
An Evaluation Framework for Plagiarism Detection
TLDR
Empirical evidence is given that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale. Expand
...
1
2
3
4
5
...