Toward Optimal Feature Selection in Naive Bayes for Text Categorization

  title={Toward Optimal Feature Selection in Naive Bayes for Text Categorization},
  author={Bo Tang and Steven Kay and Haibo He},
  journal={IEEE Transactions on Knowledge and Data Engineering},
Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties… CONTINUE READING
Highly Cited
This paper has 61 citations. REVIEW CITATIONS
Related Discussions
This paper has been referenced on Twitter 9 times. VIEW TWEETS


Publications citing this paper.
Showing 1-10 of 32 extracted citations

Feature selection for text classification: A review

Multimedia Tools and Applications • 2018
View 17 Excerpts
Highly Influenced

Mining and Sentiment Analysis using Bayesian and Neural Networks Approaches

Olha Shepelenko
View 4 Excerpts
Highly Influenced

Identification of Relevant Contextual Dimensions Using Regression Analysis

2018 Eleventh International Conference on Contemporary Computing (IC3) • 2018
View 1 Excerpt

Latent Topic Text Representation Learning on Statistical Manifolds

IEEE Transactions on Neural Networks and Learning Systems • 2018
View 1 Excerpt

A Novel Sentiment Analysis Technique in Disease Classification

K. Anita Davamani, D. Robin, Kamatchi, Krithika, Manisha
View 1 Excerpt

61 Citations

Citations per Year
Semantic Scholar estimates that this publication has 61 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 48 references

Information Theory and Statistical Mechanics Revisited

ArXiv • 2016
View 7 Excerpts
Highly Influenced

Toward integrating feature selection algorithms for classification and clustering

IEEE Transactions on Knowledge and Data Engineering • 2005
View 1 Excerpt
Highly Influenced

Exponen - tially embedded families with classspecific features for classification

P. M. Baggenstoss
IEEE Signal Process . Lett . • 2016

ENN: Extended Nearest Neighbor Method for Pattern Recognition [Research Frontier]

IEEE Computational Intelligence Magazine • 2015
View 1 Excerpt

KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning

2015 IEEE Congress on Evolutionary Computation (CEC) • 2015
View 1 Excerpt

Similar Papers

Loading similar papers…