Building Arabic corpora from Wikisource

  title={Building Arabic corpora from Wikisource},
  author={Imene Bensalem and Salim Chikhi and Paolo Rosso},
  journal={2013 ACS International Conference on Computer Systems and Applications (AICCSA)},
This paper describes a new tool that helps extracting clean text from the Arabic Wikisource dump in order to build corpora. The tool purpose is illustrated by the generation of a subcorpus from Wikisource, which is a step towards the building of an evaluation corpus for Arabic intrinsic plagiarism detection. 

2 Figures & Tables