Discriminative features for text document classification

  title={Discriminative features for text document classification},
  author={Kari Torkkola},
  journal={Formal Pattern Analysis & Applications},
The bag-of-words approach to text document representation typically results in vectors of the order of 5000–20,000 components as the representation of documents. To make effective use of various statistical classifiers, it may be necessary to reduce the dimensionality of this representation. We point out deficiencies in class discrimination of two popular such methods, Latent Semantic Indexing (LSI), and sequential feature selection according to some relevant criterion. As a remedy, we suggest… CONTINUE READING
Highly Cited
This paper has 48 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 28 extracted citations

A Feature Selection Based on Minimum Upper Bound of Bayes Error

2005 IEEE 7th Workshop on Multimedia Signal Processing • 2005
View 6 Excerpts
Highly Influenced

Optimizing features by correlating for concept labeling in text classification

2014 IEEE International Advance Computing Conference (IACC) • 2014
View 2 Excerpts

Similar Papers

Loading similar papers…