Non-Dictionary-Based Thai Word Segmentation Using Decision Trees

  title={Non-Dictionary-Based Thai Word Segmentation Using Decision Trees},
  author={Thanaruk Theeramunkong and Sasiporn Usanavasin},
For languages without word boundary delimiters, dictionaries are needed for segmenting running texts. This figure makes segmentation accuracy depend significantly on the quality of the dictionary used for analysis. If the dictionary is not sufficiently good, it will lead to a great number of unknown or unrecognized words. These unrecognized words certainly reduce segmentation accuracy. To solve such problem, we propose a method based on decision tree models. Without use of a dictionary… CONTINUE READING
Highly Cited
This paper has 33 citations. REVIEW CITATIONS

6 Figures & Tables

Extracted Numerical Results

  • The TCC corpus has 100% re precision, and 44.93% accuracy. Using the d from a Thai corpus, the precision improves and the accuracy increases up to 85.51-87 recall drops to 63.72-94.52%.
  • For a lower CF, say 50%, reca but precision and accuracy dramatically imp 85.51% respectively.
  • Each ion of the strings in r. Recall, precision, s between 50% and ethod presented the d both in precision the original TCC call but has 52.12% ecision tree learned up to 94.11-99.85% .41%.
  • In permission has to be our experiment, the to 70%, which gives the recall equals to 96.13%, precision equals to 91.92% and the accuracy equals to 87.41%.