• Publications
  • Influence
A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia
This paper describes a method to automatically create and maintain gazetteers for Named Entity Recognition (NER). This method extracts the necessary information from linguistic resources. Our
Automatic Extraction of Arabic Multiword Expressions
This paper investigates the automatic acquisition of Arabic Multiword Expressions by proposing three complementary approaches to extract MWEs from available data resources and measuring the quality and coverage of the output against gold standards.
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and
A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions
It is found that translations produced by neural machine translation systems are considerably different, more fluent and more accurate in terms of word order compared to those produced by phrase-based systems.
Named Entity WordNet
This paper presents the automatic extension of Princeton WordNet with Named Entities (NEs) and explores different aspects of the methodology such as the treatment of polysemous terms, the identification of hyponyms within the Wikipedia categorization system, the Identification of Wikipedia articles which are NEs and the design of a NE repository compliant with the LMF ISO standard.
An Automatically Built Named Entity Lexicon for Arabic
The automatic Multilingual, Interoperable Named Entity Lexicon approach to Arabic is successfully adapted and extended, using Arabic WordNet (AWN) and Arabic Wikipedia (AWK), and built the largest, most mature and well-structured Arabic NE lexical resource to date.
Translators’ perceptions of literary post-editing using statistical and neural machine translation
In the context of recent improvements in the quality of machine translation (MT) output and new use cases being found for that output, this article reports on an experiment using statistical and
Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation
This work compares three approaches to statistical machine translation by performing a fine-grained manual evaluation via error annotation of the systems’ outputs by finding the best performing system that reduces the errors produced by the worst system by 54%.
A Set of Recommendations for Assessing Human-Machine Parity in Language Translation
It is shown that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations.
Towards Using Web-Crawled Data for Domain Adaptation in Statistical Machine Translation
A strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase-based machine translation framework using domain-specific data obtained by domain-focused web crawling is presented.