Alberto Barrón-Cedeño

Learn More
We present an evaluation framework for plagiarism detection. 1 The framework provides performance measures that address the specifics of plagiarism detection , and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the(More)
This paper overviews 15 plagiarism detectors that have been evaluated within the fourth international competition on plagiarism detection at PAN'12. We report on their performances for two sub-tasks of external plagiarism detection: candidate document retrieval and detailed document comparison. Furthermore , we introduce the PAN plagiarism corpus 2012, the(More)
This paper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. We start with a unified retrieval process that summarizes the best practices employed this year. Then, the detectors' performances are evaluated in detail, highlighting several important aspects of plagiarism detection , such as obfuscation, intrinsic vs.(More)
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (i) a(More)
When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or(More)
The C-value/NC-value algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract mul-tiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of(More)
Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a(More)
The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other(More)
Plagiarism, the unacknowledged reuse of text, has increased in recent years due to the large amount of texts readily available. For instance, recent studies claim that nowadays a high rate of student reports include plagiarism, making manual plagiarism detection practically infeasible. Automatic plagiarism detection tools assist experts to analyse(More)