Diacritic Annotation in the Arabic Treebank and its Impact on Parser Evaluation

@inproceedings{Maamouri2008DiacriticAI,
  title={Diacritic Annotation in the Arabic Treebank and its Impact on Parser Evaluation},
  author={Mohamed Maamouri and Seth Kulick and Ann Bies},
  booktitle={LREC},
  year={2008}
}
Question of evaluation framework (rather than parser results) Use unvocalized or vocalized forms? Unvocalized sometimes assumed to be “realworld” but is not because  Not an accurate representation of “real-world” data  Unvocalized = vocalized with diacritics stripped out (not necessarily unchanged input data)  Vocalized = diacritics not in input data, plus some orthographic normalization Roughly 3.7% of tokens include some form of orthographic normalization 

From This Paper

Topics from this paper.

References

Publications referenced by this paper.
Showing 1-6 of 6 references

Arabic morphological analyzer version 2.0. LDC2004L02

Tim Buckwalter
Linguistic Data Consortium • 2004
View 2 Excerpts

On the Parameter Space of Lexicalized Statistical Parsing Models

Daniel M. Bikel.
Ph.D. thesis, Department of Computer and Information Sciences, University of Pennsylvania. • 2004
View 3 Excerpts

Similar Papers

Loading similar papers…