Batuer Aisha

  • Citations Per Year
Learn More
Tokenization is very important for Uyghur language processing. Tokenization of Uyghur, an agglutinative language, is quite different from other languages such as Chinese and English. In this paper we propose a two-steps statistical tokenization method for Uyghur. Two related factors, the feature template scheme and the manually tokenized corpora, are also(More)
  • Batuer Aisha
  • 2010 International Conference on Asian Language…
  • 2010
In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very(More)
In this paper, we propose a novel method for improving the classification performance of short text strings using conditional random fields (CRFs) that combine morphological information. Experimental results on three datasets (Uyghur, Chinese, and English) demonstrate that our method can yield higher classification accuracy than Support Vector Machine (SVM)(More)
  • 1