Data Set Used
In this paper, we describe COLABA, a large effort to create resources and processing tools for Dialectal Arabic Blogs. We describe the objectives of the project, the process flow and the interaction between the different components. We briefly describe the manual annotation effort and the resources created. Finally, we sketch how these resources and tools… (More)
MAGEAD is a morphological analyzer and generator for Modern Standard Arabic (MSA) and its dialects. We introduced MAGEAD in previous work with an implementation of MSA and Levantine Arabic verbs. In this paper, we port that system to MSA nominals (nouns and adjectives), which are far more complex to model than verbs. Our system is a functional morphological… (More)
Implementations of models of morphologically rich languages such as Arabic typically achieve speed and small memory footprint at the cost of abandoning linguistically abstract and elegant representations. We present a solution to modeling rich morphologies that is both fast and based on linguistically rich representations. In our approach, we convert a… (More)
DIRA is a query expansion tool that generates search terms in Standard Arabic and/or its dialects when provided with queries in English or Standard Arabic. The retrieval of dialectal Arabic text has recently become necessary due to the increase of dialectal content on social media. DIRA addresses the challenges of retrieving information in Arabic dialects,… (More)
We introduce a novel task, that of associating relative time with cities in text. We show that the task can be performed using NLP tools and techniques. The task is deployed on a large corpus of data to study a specific phenomenon, namely the temporal dimension of contemporary arts globalization over the first decade of the 21 st century.
In this paper, we attempt to summarize online discussions by filtering posts. Selecting the highly related posts from the discussion boards leads to a summarized version of the discussion. Online Discussion Summarizer (ODS) is based on unsupervised information retrieval techniques. Four features are used in the summarization function; which are the term… (More)