Text Classification in Asian Languages without Word Segmentation

@inproceedings{Peng2003TextCI,
  title={Text Classification in Asian Languages without Word Segmentation},
  author={Fuchun Peng and Xiangji Huang and Dale Schuurmans and Shaojun Wang},
  year={2003}
}
We present a simple approach for Asian language text classification without word segmentation, based on statistical n-gram language modeling. In particular, we examine Chinese and Japanese text classification. With character n-gram models, our approach avoids word segmentation. However, unlike traditional ad hoc n-gram models, the statistical language modeling based approach has strong information theoretic basis and avoids explicit feature selection procedure which potentially loses… CONTINUE READING

Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-10 OF 25 CITATIONS

A logistic regression-based smoothing method for Chinese text categorization

VIEW 10 EXCERPTS
CITES METHODS, BACKGROUND & RESULTS
HIGHLY INFLUENCED

Classification : With n-Gram Language Models

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

URL-Based Web Page Classification: With n-Gram Language Models

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Using some web content mining techniques for Arabic text classification

VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Author Identification on Noise Arabic Documents

  • 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT)
  • 2018
VIEW 1 EXCERPT

Arabic Opinion Mining Using Parallel Decision Trees

  • 2017 Palestinian International Conference on Information and Communication Technology (PICICT)
  • 2017

Document classification through image-based character embedding and wildcard training

  • 2016 IEEE International Conference on Big Data (Big Data)
  • 2016
VIEW 1 EXCERPT
CITES METHODS

Similar Papers

Loading similar papers…