Language Model Based Arabic Word Segmentation

@inproceedings{Lee2003LanguageMB,
  title={Language Model Based Arabic Word Segmentation},
  author={Young-Suk Lee and Kishore Papineni and Salim Roukos and Ossama Emam and Hany Hassan},
  booktitle={ACL},
  year={2003}
}
We approximate Arabic’s rich morphology by a model that a word consists of a sequence of morphemes in the pattern prefix*-stem-suffix* (* denotes zero or more occurrences of a morpheme). Our method is seeded by a small manually segmented Arabic corpus and uses it to bootstrap an unsupervised algorithm to build the Arabic word segmenter from a large unsegmented Arabic corpus. The algorithm uses a trigram language model to determine the most probable morpheme sequence for a given input. The… CONTINUE READING
Highly Influential
This paper has highly influenced 12 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 161 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 113 extracted citations

Arabic diacritic restoration approach based on maximum entropy models

Computer Speech & Language • 2009
View 3 Excerpts
Highly Influenced

161 Citations

01020'03'06'10'14'18
Citations per Year
Semantic Scholar estimates that this publication has 161 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…