Cross-language plagiarism detection

@article{Potthast2011CrosslanguagePD,
  title={Cross-language plagiarism detection},
  author={Martin Potthast and Alberto Barr{\'o}n-Cede{\~n}o and Benno Stein and P. Rosso},
  journal={Language Resources and Evaluation},
  year={2011},
  volume={45},
  pages={45-62}
}
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (1) a comprehensive retrieval process for cross-language plagiarism detection is introduced, highlighting the differences to monolingual plagiarism… Expand
Methods for cross-language plagiarism detection
TLDR
This paper proposes a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing, and explores the suitability of three cross-language similarity estimation models. Expand
Plagiarism Detection across Distant Language Pairs
TLDR
Two recently proposed cross-language plagiarism detection methods are compared to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA), and the effectiveness of the three approaches for less related languages is explored. Expand
Cross-Language Plagiarism Detection Using a Multilingual Semantic Network
TLDR
Experimental results indicate that the proposed graph-based approach is a good alternative for cross-language plagiarism detection and compared with two state-of-the-art models. Expand
Cross-lingual text alignment for fine-grained plagiarism detection
TLDR
The proposed approach has two main steps: the first step tries to find candidate plagiarised fragments and focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment that will filter the results by finding alignments between the identified fragments. Expand
Cross-lingual plagiarism detection techniques for English-Hindi language pairs
TLDR
Different cross-language plagiarism detection approaches in context of Indian languages pairs such as Hindi-English, English-Hindi language pairs are presented. Expand
Cross-Language Plagiarism Detection Methods
TLDR
The present paper provides a summary on the existing approaches to plagiarism detection in multilingual context and attempts to show the development of detection approaches from the first experiments based on machine translation pre-processing to the up-to-date knowledge-based systems that proved to obtain reliable results on various corpora. Expand
Cross-language plagiarism detection over continuous-space representations of language
Cross-language (CL) plagiarism detection aims at detecting plagiarised fragments of text among documents in different languages. In this work we perform a comparison of different methods that makeExpand
Cross-language text alignment: A proposed two-level matching scheme for plagiarism detection
TLDR
The experimental results show that the proposed cross-language text alignment approach significantly outperforms the state-of-the-art models and can be fed into an expert system for further improvement of cross- language plagiarism detection. Expand
A New Approach for Cross-Language Plagiarism Analysis
TLDR
A plagiarism detection method composed by five main phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and post-processing, showing that the method achieved better results with medium and large plagiarized passages. Expand
Using a Dictionary and n-gram Alignment to Improve Fine-grained Cross-Language Plagiarism Detection
TLDR
A novel approach for assessing cross-language similarity between texts for detecting plagiarized cases that has two main steps: a vector-based retrieval framework that focuses on high recall, followed by a more precise similarity analysis based on dynamic text alignment. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
On Cross-lingual Plagiarism Analysis using a Statistical Model
TLDR
The process for the automatic cross-lingual plagiarism analysis based on the statistical bilingual dictionary has shown good results and it is considered that it could be useful also for the cross-lingsual nearduplicate detection task. Expand
Intrinsic Plagiarism Detection
TLDR
It is shown that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style, and new features for the quantification of style aspects are added. Expand
Multilingual Plagiarism Detection
TLDR
A new method called MLPlag is proposed for plagiarism detection in multilingual environment based on analysis of word positions which identifies the replacement of synonyms used by plagiarists to hide the document match. Expand
The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages
TLDR
A new, unique and freely available parallel corpus containing European Union documents of mostly legal nature, available in all 20 official EU languages, which is particularly suitable to carry out all types of cross-language research and to test and benchmark text analysis software across different languages. Expand
Old and new challenges in automatic plagiarism detection
TLDR
The nature of the plagiarism problem is explored, and the approaches used so far for its detection are summarized, and a number of methods used to measure text reuse are discussed. Expand
Intrinsic Plagiarism Analysis with Meta Learning
TLDR
A hybrid approach that employs style marker analysis for the purpose of hypotheses generation which then are accepted or rejected by an authorship verification analysis is proposed and the evaluation of style markers for German text and their application to a real-world plagiarism case is evaluated. Expand
Automatic Identification of Document Translations in Large Multilingual Document Collections
TLDR
A working system that can identify translations and other very similar documents among a large number of candi-dates, by representing the document con-tents with a vector of thesaurus terms from a multilingualThesaurus, and by then measur-ing the semantic similarity between the vec-tors. Expand
Using Query-Relevant Documents Pairs for Cross-Lingual Information Retrieval
TLDR
This work proposes to use a training corpus made up by a set of Query-Relevant Document Pairs (QRDP) in a probabilistic cross-lingual information retrieval approach which is based on the IBM alignment model 1 for statistical machine translation. Expand
A statistical approach to crosslingual natural language tasks
TLDR
This work proposes to use a direct probabilistic crosslingual NLP system which integrates both steps, translation and the specific NLP task, into a single one, and uses the statistical IBM 1 word alignment model (M1). Expand
A Wikipedia-Based Multilingual Retrieval Model
TLDR
Results are presented of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked. Expand
...
1
2
3
4
5
...