Information-theoretic feature selection algorithms for text classification

  title={Information-theoretic feature selection algorithms for text classification},
  author={Jana Novovicov{\'a} and Anton{\'i}n Mal{\'i}k},
  journal={Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.},
  pages={3272-3277 vol. 5}
A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper, we present four new algorithms for feature/word selection for the purpose of text classification. We use sequential forward selection methods based on improved mutual information criterion functions. The performance of the proposed evaluation functions compared to the information gain which evaluate features individually is discussed. We present experimental results using… CONTINUE READING