Learn More
Named Entity Recognition or Extraction (NER) is an important task for automated text processing for industries and academia engaged in the field of language processing, intelligence gathering and Bioinformatics. In this paper we discuss the general problem of Named Entity Recognition, more specifically the challenges in NER in languages that do not have(More)
Weareinterestedincontributingasmall,publicly availableUrducorpusofwrittentexttothenatu-ral language processing community. The Urdu text is stored in the Unicode character set, in its native Arabic script,andmarkedupaccordingto the Corpus Encoding Standard (CES) XML Document Type Definition (DTD). All the tags and metadata are in English. To date, the corpus(More)
BACKGROUND Hepatitis B and C is common in Pakistan and various risk factors are attributable to its spread.One thousand and fifty consecutive male cases suffering from chronic liver disease (327 HBV and 723 HCV) were selected from the OPD of public sector hospital and a private clinic dealing exclusively with the liver patients. To compare the results 723(More)
This paper explains the challenges pertaining to Urdu stemming and presents a rule-based prototype with a few rules implemented for Urdu to motivate the intricacies. It shows that Urdu stemming is quite challenging because of Urdu's diverse nature and because Arabic and Farsi stemmers cannot be used for Urdu. Dictionary-based error-correcting schemes used(More)
This paper describes a thesis proposal to do concept search in non English and non European languages. Urdu is chosen as an example language because of its unique nature, morphology and a large number of speakers. Besides its importance, Urdu does not have adequate language resources to do intellectual research in Information Retrieval (IR). It is shown(More)
Goal of conferences like TREC, TIPSTER, NTCIR, CLEF is to judge the performance of different algorithms. Most of these conferences have tracks that deal with new and innovative information retrieval problems, but none has tackled to work with Urdu data, primarily because of the lack of resources. In this paper we present a baseline for Urdu IR evaluation(More)
Several algorithms based on link analysis have been developed to measure the importance of nodes on a graph such as pages on the World Wide Web. PageRank and HITS are the most popular ranking algorithms to rank the nodes of any directed graph. But, both these algorithms assign equal importance to all the edges and nodes, ignoring the semantically rich(More)
  • 1