A Comparative Study on Feature Selection in Text Categorization

@inproceedings{Yang1997ACS,
  title={A Comparative Study on Feature Selection in Text Categorization},
  author={Yiming Yang and Jan O. Pedersen},
  booktitle={ICML},
  year={1997}
}
This paper is a comparative study of feature selection methods in statistical learning of text categorization The focus is on aggres sive dimensionality reduction Five meth ods were evaluated including term selection based on document frequency DF informa tion gain IG mutual information MI a test CHI and term strength TS We found IG and CHI most e ective in our ex periments Using IG thresholding with a k nearest neighbor classi er on the Reuters cor pus removal of up to removal of unique terms… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 2,699 CITATIONS, ESTIMATED 41% COVERAGE

6,615 Citations

0200400'97'02'08'14
Citations per Year
Semantic Scholar estimates that this publication has 6,615 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
SHOWING 1-10 OF 29 REFERENCES

A comparison of classi ers and document represen- tations for the routing problem

  • H. Sch utze, D. A. Hull, J. O. Pedersen
  • Ann Int ACM SIGIR Conference on Research and De…
  • 1995
Highly Influential
4 Excerpts

Weigend. A neural network approach to topic spotting

  • E. Wiener, J. O. Pedersen, A.S
  • Proceedings of the Fourth Annual Symposium on…
  • 1995
Highly Influential
8 Excerpts

Similar Papers

Loading similar papers…