Yassine Benajiba

Learn More
The task of Named Entity Recognition (NER) allows to identify proper names as well as temporal and numeric expressions, in an open-domain text. NER systems proved to be very important for many tasks in Natural Language Processing (NLP) such as Information Retrieval and Question Answering tasks. Unfortunately, the main efforts to build reliable NER systems(More)
The Named Entity Recognition (NER) task has been garnering significant attention in NLP as it helps improve the performance of many natural language processing applications. In this paper, we investigate the impact of using different sets of features in two discriminative machine learning frameworks, namely, Support Vector Machines and Conditional Random(More)
In this paper we describe an improved version of ANERsys, an Arabic Named Entity Recognition system for open-domain texts. The first version of ANERsys was totally based on the Maximum Entropy approach and was trained and tested with corpora which we have built ourselves. The results showed that the Maximum Entropy is an appropriate method to identify Named(More)
In this paper, we describe COLABA, a large effort to create resources and processing tools for Dialectal Arabic Blogs. We describe the objectives of the project, the process flow and the interaction between the different components. We briefly describe the manual annotation effort and the resources created. Finally, we sketch how these resources and tools(More)
Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. We bootstrap noisy features by projection from an Arabic-English(More)
This paper describes the Question Answering for Machine Reading (QA4MRE) Main Task at the 2013 Cross Language Evaluation Forum. In the main task, systems answered multiple-choice questions on documents concerned with four different topics. There were also two pilot tasks, Machine Reading on Biomedical Texts about Alzheimer's disease, and Japanese Entrance(More)