Learn More
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new(More)
In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit: a b s t r a c t Content-based spam filtering is a binary text categorization problem.(More)
Though the humans are more susceptible to unpleasant stimuli of higher intensity, how the valence intensity of unpleasant stimuli impacts subsequent cognitive processing, and whether this impact increases with the unpleasantness, require clarification. For this purpose, event-related potentials (ERPs) were recorded for highly negative (HN), mildly negative(More)
The high dimensionality of the text categorization raises big hurdles in applying many sophisticated learning algorithms to the text categorization. Feature selection, which reduces the number of features that represent documents, is an absolute requirement in text categorization. In this paper, we proposed a feature selection method, which improved the(More)
The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was(More)
Feature selection is one of methods that reduce the size of the number of features in text categorization. In this paper, we proposed a feature selection method, which filtered some features that only rarely occur in one category and do not occur in other categories from the feature subset generated by ambiguity measure method. The experiments show that the(More)
Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively measures the degree of relevance and distinction of terms occur in document set. We evaluated AD on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two(More)
Text categorization is an important means to process automatically the information which increases exponentially. But due to the high dimensionality of the text corpus, many sophisticated classifiers can not be efficiently and effectively used in text categorization. So feature selection has become a research focus in text categorization. In this paper, we(More)