Zobia Rehman

Learn More
Urdu is morphologically rich language with different nature of its characters. Urdu text tokenization and sentence boundary disambiguation is difficult as compared to the language like English. Major hurdle for tokenization is improper use of space between words, where as absence of case discrimination makes the sentence boundary detection a difficult task.(More)
Sentence boundary identification is a preliminary step for preparing a text document for Natural Language Processing tasks, e.g., machine translation, POS tagging, text summarization and etc. We present a hybrid approach for Urdu sentence boundary disambiguation comprising of unigram statistical model and rule based algorithm. After implementing this(More)
Sentence boundary identification is an important step for text processing tasks, e.g., machine translation, POS tagging, text summarization etc., in this paper, we present an approach comprising of Feed Forward Neural Network (FFNN) along with part of speech information of the words in a corpus. Proposed adaptive system has been tested after training it(More)
Spelling of words of a language are standardized by language authorities or consortiums and available in dictionaries or lexicons. For instance, “produkt” does not belong to English dictionary. Similarly “نايمرد“ is a correctly spelled word, while “رنايمرد “ is a non-word in Urdu. Electronic representation of text is commonly used in today’s computing(More)
  • Ruba Talal Ibrahim, Zahraa Tariq Mohammed, +11 authors Haruna Chiroma
  • 2017
Because of computational drawbacks of conventional numerical methods in solving complex optimization problems, researchers may have to rely on meta-heuristic algorithms. Particle swarm optimization (PSO) is one of the most widely used algorithms due to its simplicity of implementation and fast convergence speed. Also, the cuckoo search algorithm is a(More)
Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been proposed for Urdu text tokenization, along with some other(More)
  • 1