Learn More
This paper reports about the development of a Named Entity Recognition (NER) system for South and South East Asian languages , particularly for Bengali, Hindi, Te-lugu, Oriya and Urdu as part of the IJCNLP-08 NER Shared Task 1. We have used the statistical Conditional Random Fields (CRFs). The system makes use of the different contextual information of the(More)
Most machine transliteration systems transliterate out of vocabulary (OOV) words through intermediate phonemic mapping. A framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. A modified joint source–channel model along with a number of alternatives have(More)
Named Entity Recognition (NER) aims to classify each word of a document into prede-fined target named entity classes and is nowadays considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation , information extraction, question answering systems and others. This paper reports about the(More)
—Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the(More)
This paper reports about the development of a Named Entity Recognition (NER) system for Bengali using the statistical Conditional Random Fields (CRFs). The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the various named entity (NE) classes. A portion of the partially(More)
news corpus, we identify various word-level orthographic features to use in the POS taggers. The lexicon and a Named Entity Recognition (NER) system, developed using this corpus, are also used in POS tagging. The POS taggers are then evaluated with Hindi and Telugu data. Evaluation results demonstrates the fact that SVM performs better than HMM for all the(More)
Named Entity Recognition and Classification (NERC) is a well-studied NLP task typically focused on coarse-grained named entity (NE) classes. NERC for more fine-grained semantic NE classes has not been systematically studied. This paper quantifies the difficulty of fine-grained NERC (FG-NERC) when performed at large scale on the people domain. We apply(More)
This paper reports about our work in the NEWS 2009 Machine Transliteration Shared Task held as part of ACL-IJCNLP 2009. We submitted one standard run and two non-standard runs for English to Hindi translitera-tion. The modified joint source-channel model has been used along with a number of alternatives. The system has been trained on the NEWS 2009 Machine(More)