Challenges in Developing Persian Corpora from Online Resources

@article{Ghayoomi2009ChallengesID,
  title={Challenges in Developing Persian Corpora from Online Resources},
  author={Masood Ghayoomi and Saeedeh Momtazi},
  journal={2009 International Conference on Asian Language Processing},
  year={2009},
  pages={108-113}
}
Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian from on-line materials are discussed. The sources of the problems are the Persian script itself; mixture with the Arabic script; Persian orthography; the typists… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.
2 Citations
17 References
Similar Papers

References

Publications referenced by this paper.
Showing 1-10 of 17 references

Incorporation: Word production of Persian prepositions and its application in computational linguistics

  • Z. Abolhasani, M. Ghayoomi
  • In Proceedings of the 2nd Workshop on the Persian…
  • 2006
Highly Influential
7 Excerpts

PLDB: Persian linguistics database” Pazhuheshgaran (Researchers), Institute for Humanities and Cultural Studies Newsletter

  • S. M. Assi
  • 2005
3 Excerpts

Assessment of a modern Farsi corpus

  • E Darrudi
  • In Proceedings of the 2nd Workshop on Information…
  • 2004
1 Excerpt

Persian language and IT

  • S. M. Assi
  • Proceedings of the 2 Workshop on Information…
  • 2004
3 Excerpts

Persian monologue telephone speech database: TFARSDAT

  • M Bijankhan
  • In Proceedings of the 1st Workshop on Persian…
  • 2004
1 Excerpt

The Persian Speech Database: FARSDAT

  • M Bijankhan
  • In Proceedings of the 1st Workshop on Persian…
  • 2004
1 Excerpt

The Persian dialogue telephone database

  • M Bijankhan
  • In Proceedings of the 1st Workshop on Persian…
  • 2004
1 Excerpt

The Persian text corpus

  • M Bijankhan
  • In Proceedings of the 1st Workshop on Persian…
  • 2004
1 Excerpt

Similar Papers

Loading similar papers…