Computational detection of Uyghur multiword expressions
- Murat Orhun
- 2011 IEEE 3rd International Conference on…
Abstract This paper presents a probabilistic model for automatically tagging names in a Turkish text. We used four different information sources to model names, and successfully combined them. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained a significant improvement. After this, we modeled the morphological analyses of the words, and finally, we modeled the tag sequence, and reached an F-measure of 91.56% in Turkish name tagging. Our results are important in the sense that, using linguistic information, i.e. morphological analyses of the words, and a corpus large enough to train a statistical model helps this basic information extraction task.