The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there(More)
The Persian language is one of the languages in Middle-East, so there are significant amount of Persian documents available on the Web. But there are relatively few studies on retrieval of Persian documents in the literature. In this experimental study, we assessed term and N-gram based vector space model and a query expansion method, namely, local context(More)
One of the fundamental tasks in natural language processing is part of speech (POS) tagging. A POS tagger is a piece of software that reads text in some language and assigns a part of speech tag to each one of the words. Our main interest in this research was to see how easy it is to apply methods used in a language such as English to a new and different(More)
This paper describes creation of a test collection for Persian Part of Speech Tagging experiments. This collection was created by modifying a manually Part of Speech (POS) tagged Persian corpus with over two million tagged words. The original collection had a tag set of 550 tags that are more than what any machine learning algorithm can handle. The number(More)
With rapid growth of information sources, it is essential to develop methods that retrieve most relevant information according to the user requirements. One way of improving the quality of retrieval is to use more than one retrieval engine and then merge the retrieved results and show a single ranked list to the user. There are studies that suggest(More)
Persian (Farsi) is one of the languages of Middle East. There are significant amount of Persian documents available in digital form and even more are created every day. Therefore, there is a necessity to implement Information Retrieval System with high precision for this language. This paper discusses the design, implementation and testing of a Fuzzy(More)
One of the major activities in Natural Language Processing is determining a word's part of speech (POS) tag. In this research we focus on improving the accuracy of Persian part of speech tagging by applying post processing heuristic rules. To evaluate the effects of those rules we use Bijankhan tagged corpus and for tagging, Maximum Likelihood Estimation(More)
TextWise LLC. participated in the TREC-7 Cross-Language Retrieval track using the CINDOR system, which utilizes a " conceptual interlingua " representation of documents and queries. The current CINDOR research system uses a conceptual interlingua constructed around the Princeton WordNet, which we are mapping into French and Spanish. The use of an(More)