Nabil Hathout

Learn More
This paper presents a disambiguation method in which word senses are determined using a dictionary. We use a semantic proximity measure between words in the dictionary, taking into account the whole topology of the dictionary, seen as a graph on its entries. We have tested the method on the problem of disambiguation of the dictionary entries themselves,(More)
Morphological resources such as CELEX do not exist for many languages. NLP and RI systems that operate on texts and documents written in these languages have then to rely on morphological resources acquired from lexica or corpora. These resources usually suffer from a problem of precision because no a priori semantic knowledge is used for their acquisition.(More)
We address in this paper some problems related to the reuse for NLP of LADL’s Lexicon-Grammar (LG). This major source of French verbs lexical knowledge has been publicly available on the Internet for several years. However, it has not been used by the NLP community, mainly because of its format: ASCII files each of them containing a table with binary values(More)
This paper reports on the procedure and learning models we adopted for the ‘PAN 2011 Author Identification’ challenge targetting real-world email messages. The novelty of our approach lies in a design which combines shallow characteristics of the emails (words and trigrams frequencies) with a large number of ad hoc linguistically-rich features addressing(More)
Synonyms extraction is a difficult task to achieve and evaluate. Some studies have tried to exploit general dictionaries for that purpose, seeing them as graphs where words are related by the definition they appear in, in a complex network of an arguably semantic nature. The advantage of using a general dictionary lies in the coverage, and the availability(More)
RÉSUMÉ Cet article présente GLÀFF, un lexique du français à large couverture extrait du Wiktionnaire, le dictionnaire collaboratif en ligne. GLÀFF contient pour chaque entrée une description morphosyntaxique et une transcription phonémique. Il se distingue des autres lexiques existants principalement par sa taille, sa licence libre et la possibilité de le(More)
Distributional semantics models can be built using simple bag-of-word representation of a word’s contexts (window-based) or using more complex syntactic information (syntaxbased). Previous studies have compared their relative efficiency without coming to a definitive conclusion, but such examination has never been performed on small and specialised corpora.(More)
The paper presents a computational model aiming at making the morphological structure of the lexicon emerge from the formal and semantic regularities of the words it contains. The model is purely lexemebased. The proposed morphological structure consists of (1) binary relations that connect each headword with words that are morphologically related, and(More)