• Corpus ID: 15419664

Language Independent Named Entity Recognition in Indian Languages

  title={Language Independent Named Entity Recognition in Indian Languages},
  author={Asif Ekbal and Rejwanul Haque and Amitava Das and Venkateswarlu Poka and Sivaji Bandyopadhyay},
This paper reports about the development of a Named Entity Recognition (NER) system for South and South East Asian languages, particularly for Bengali, Hindi, Telugu, Oriya and Urdu as part of the IJCNLP-08 NER Shared Task 1 . We have 

Tables from this paper

Named Entity Recognition in Hindi Using Hidden Markov Model
This paper has discussed NER in Hindi using Hidden Markov Model (HMM), and discussed the challenges faced while performing N ER in Indian languages.
Study of Named Entity Recognition for Indian Languages
  • H. Shah
  • Computer Science, Linguistics
  • 2016
Comparison study to recognize named entity is done and it is identified that CRF approach proven best for Indian languages to identify named entity.
Named Entity Recognition: A Survey for the Indian Languages
A brief overview of NER and its issues in the Indian languages is presented and the results obtained for the different Indian languages in terms of F-measure are presented.
Named Entity Recognition for Gujarati: A CRF Based Approach
An NER tagger is build using Conditional Random Fields (CRF) and is capable of identifying person, location and organization names with an F1-score of 0.832.
A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu
A survey of various approaches for identification of Named Entities (NE) in Indian Languages is presented and the observations and research related to NER are critically described.
The results that are presented are achieved by performing NER in Hindi, Bengali and Telugu using Hidden Markov Model (HMM) and Performance Metrics.
The task in which proper nouns in a given document are discovered and then categorized into respective classes is explained and different approaches of NER are described.
Named Entity Recognition for South and South East Asian Languages: Taking Stock
A brief discussion of the problem of Named Entity Recognition (NER) in the context of the IJCNLP workshop on NER for South and South East Asian languages 1 and the development of a named entity annotated corpus in five South Asian language is presented.
Rule-Based Named Entity Recognition in Urdu
It is concluded that the NER computational models for Hindi cannot be applied to Urdu and a rule-based Urdu NER algorithm is presented that outperforms the models that use statistical learning.
Handling Unknown Words in Named Entity Recognition using Transliteration
This paper has discussed how transliteration is useful in handling unknown words in Named Entity Recognition (NER) and shown some of the results on unknown words handling in NER using transliterations.


Named Entity Recognition and transliteration in Bengali
The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named
Improving Machine Translation Quality with Automatic Named Entity Recognition
  • Bogdan Babych, Anthony F. Hartley
  • Computer Science
    Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools Resources and Tools for Building MT - EAMT '03
  • 2003
An experiment in which MT input was processed using output from the named entity recognition module of Sheffield's GATE information extraction (IE) system shows a gain in MT quality, indicating that specific components of IE technology could boost the performance of current MT systems.
Named entities : recognition, classification and use
A survey of named entity recognition and classification and a note on the semantic and morphological properties of proper names in the Prolex project.
Rapid development of Hindi named entity recognition using conditional random fields and feature induction
This paper describes the application of conditional random fields with feature induction to a Hindi named entity recognition task and uses a combination of a Gaussian prior and early stopping based on the results of 10-fold cross validation.
A web-based Bengali news corpus for named entity recognition
A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper and named Entity Recognition systems based on pattern based shallow parsing with or without using linguistic knowledge have been developed using a part of this corpus.
A Maximum Entropy Approach to Named Entity Recognition
This thesis describes a novel statistical named-entity recognition system known as MENE (Maximum Entropy Named Entity), and demonstrates the trans-lingual portability of the system, which was competitive with the best systems built by native Japanese speakers despite the fact that the author speaks no Japanese.
Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons
This work has shown that conditionally-trained models, such as conditional maximum entropy models, handle inter-dependent features of greedy sequence modeling in NLP well.
Description of the Japanese NE System Used for MET-2
In this paper, experiments on the Japanese Named Entity task are reported, and a supervised learning mechanism is employed, which would be easier than creating complicated patterns.
Learning to Tag Multilingual Texts Through Observation
This paper describes RoboTag, an advanced prototype for a machine learningbased multilingual information extraction system, and describes a general client/server architecture used in learning from observation and presents experimental results which compare RoboTag to both human-tagged keys and to the best hand-coded rule systems.
An Algorithm that Learns What's in a Name
IdentiFinderTM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities, is evaluated and is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available.