Building Arabic corpora from Wikisource

@article{Bensalem2013BuildingAC,
  title={Building Arabic corpora from Wikisource},
  author={Imene Bensalem and Salim Chikhi and Paolo Rosso},
  journal={2013 ACS International Conference on Computer Systems and Applications (AICCSA)},
  year={2013},
  pages={1-2}
}
This paper describes a new tool that helps extracting clean text from the Arabic Wikisource dump in order to build corpora. The tool purpose is illustrated by the generation of a subcorpus from Wikisource, which is a step towards the building of an evaluation corpus for Arabic intrinsic plagiarism detection. 

2 Figures & Tables

Topic