Learn More
The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there(More)
ʇ ɷ> ʈɳ ʈ ɳ >ɳɳ ɸɳ ʇɸ ʇ @ɸ ɺɸɳ>. ʆ ɳ ɷ> ʀ ɷ ɷ> ɷɮ @ ʅ ɷ ʇ ʆ ʇ [1] < > ɸɳ> ʈɷ ʀ @ ɳ ʀ >ɷ ʇ @ ɳ ʇ ʅɳ ʆɮ ʆ> > ʇ>ɳ> >. ʆʀ ʇ ɸɳ <>ɷ ɸɳ ʆɳ < ɳ ɷ> ʆɳ ɷɳɳ ʅ ɷ ʆɸ > ɸ ɷɷ ʇ ɳ. ɸɷɳ @ɷɮ @ ʈ @ ɳ ʇ ʅɳ ʇ ɳ @ʈɷ ʈ ʇ ʅɳ >ɳ > ʇɳ> ʆ ɸɳ ʇ>ɳ ɳʀɳ ɺʈɷ) ʀ ʀ ʄɳ (ʆɳ ɷ> > ɸ ɷ> ɷ. >ɳ ʇ ɳ ɶ ʇʈ ɸɳ ʆ ʇ ʀ ʀ ʆɳ ʇɳ> ɷ> @ ʆɸ ɺɸɳ>. @ > ɸ ɺɸɳ> @ < <> <ʇ <ɷ ʆɸ ɺɸɳ> ʄɳ Abstract One of the(More)
  • Fahimeh Raja, Hadi Amiri, Samira Tasharofi, Mehdi Sarmadi, Hossein Hojjat, Farhad Oroumchian
  • 2007
One of the fundamental tasks in natural language processing is part of speech (POS) tagging. A POS tagger is a piece of software that reads text in some language and assigns a part of speech tag to each one of the words. Our main interest in this research was to see how easy it is to apply methods used in a language such as English to a new and different(More)
The Persian language is one of the dominant languages in Middle-East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method, namely, Local(More)
  • Farhad Oroumchian, Samira Tasharofi, Hadi Amiri, Hossein Hojjat, Fahimeh Raja, Fahima Raja
  • 2007
This paper describes creation of a test collection for Persian Part of Speech Tagging experiments. This collection was created by modifying a manually Part of Speech (POS) tagged Persian corpus with over two million tagged words. The original collection had a tag set of 550 tags that are more than what any machine learning algorithm can handle. The number(More)
TextWise LLC. participated in the TREC-7 Cross-Language Retrieval track using the CINDOR system, which utilizes a " conceptual interlingua " representation of documents and queries. The current CINDOR research system uses a conceptual interlingua constructed around the Princeton WordNet, which we are mapping into French and Spanish. The use of an(More)
The increasing importance of Unicode for text encoding implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. The approach presented in this paper aims to reduce the storage and the transmission time for Persian text files in web-based applications and Internet. The basic idea here is to(More)