• Corpus ID: 7559749

Extracting an English-Persian Parallel Corpus from Comparable Corpora

  title={Extracting an English-Persian Parallel Corpus from Comparable Corpora},
  author={Akbar Karimi and Ebrahim Ansari and Bahram Sadeghi Bigham},
Parallel data are an important part of a reliable Statistical Machine Translation (SMT) system. The more of these data are available, the better the quality of the SMT system. However, for some language pairs such as Persian-English, parallel sources of this kind are scarce. In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned Wikipedia. Two machine translation systems are employed to translate from Persian to English and the… 

Figures and Tables from this paper

