An Open-Source Finite State Morphological Transducer for Modern Standard Arabic

  title={An Open-Source Finite State Morphological Transducer for Modern Standard Arabic},
  author={Mohammed Attia and Pavel Pecina and Antonio Toral and Lamia Tounsi and Josef van Genabith},
We develop an open-source large-scale finitestate morphological processing toolkit (AraComLex) for Modern Standard Arabic (MSA) distributed under the GPLv3 license.1 The morphological transducer is based on a lexical database specifically constructed for this purpose. In contrast to previous resources, the database is tuned to MSA, eliminating lexical entries no longer attested in contemporary use. The database is built using a corpus of 1,089,111,204 words, a pre-annotation tool, machine… CONTINUE READING
Highly Cited
This paper has 23 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 17 extracted citations

A single-model approach for Arabic segmentation, POS tagging, and named entity recognition

2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) • 2018
View 1 Excerpt

Interoperable Arabic language resources building and exploitation in SAFAR platform

2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) • 2016
View 2 Excerpts

Building a lexical semantic resource for Arabic morphological Patterns

2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA) • 2013
View 2 Excerpts


Publications referenced by this paper.
Showing 1-10 of 23 references

LDC Standard Arabic Morphological Analyzer (SAMA) v

M. Maamouri, D. Graff, B. Bouziri, S. Krouna, S. Kulick
3.1. LDC Catalog No. LDC2010L01. ISBN: 1-58563-555-3. • 2010
View 4 Excerpts
Highly Influenced

Buckwalter Arabic Morphological Analyzer ( BAMA ) Version 2 . 0

T. Buckwalter
Linguistic Data Consortium ( LDC ) catalogue number LDC • 2004
View 4 Excerpts
Highly Influenced

Buckwalter Arabic Morphological Analyzer (BAMA) Version 2.0. Linguistic Data Consortium (LDC) catalogue number LDC2004L02, ISBN1-58563-324-0

T. Buckwalter
View 4 Excerpts
Highly Influenced

Finite State Morphology: CSLI studies in computational linguistics

K. R. Beesley, L. Karttunen
Stanford, Calif.: Csli. • 2003
View 5 Excerpts
Highly Influenced

Arabic Gigaword Fourth Edition

R. Parker, D. Graff, K. Chen, J. Kong, K. Maeda
LDC Catalog No. LDC2009T30. ISBN: 1-58563-532-4. • 2009
View 2 Excerpts

The Oxford Guide to Practical Lexicography

B.T.S. Atkins, M. Rundell
Oxford University Press. • 2008
View 1 Excerpt

Similar Papers

Loading similar papers…