Development of Bengali Named Entity Tagged Corpus and its Use in NER Systems

@inproceedings{Ekbal2008DevelopmentOB,
  title={Development of Bengali Named Entity Tagged Corpus and its Use in NER Systems},
  author={Asif Ekbal and Sivaji Bandyopadhyay},
  booktitle={IJCNLP},
  year={2008}
}
The rapid development of language tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper. A web crawler retrieves the web pages in Hyper Text Markup Language (HTML) format from the news archive. At present, the corpus contains approximately 34 million wordforms. The date, location, reporter and agency tags present in the web pages have been… CONTINUE READING