Jieming Yang

Learn More
Content-based spam filtering is a binary text categorization problem. To improve the performance of the spam filtering, feature selection, as an important and indispensable means of text categorization, also plays an important role in spam filtering. We proposed a new method, named Bi-Test, which utilizes binomial hypothesis testing to estimate whether the(More)
Though the humans are more susceptible to unpleasant stimuli of higher intensity, how the valence intensity of unpleasant stimuli impacts subsequent cognitive processing, and whether this impact increases with the unpleasantness, require clarification. For this purpose, event-related potentials (ERPs) were recorded for highly negative (HN), mildly negative(More)
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new(More)
The high dimensionality of the text categorization raises big hurdles in applying many sophisticated learning algorithms to the text categorization. Feature selection, which reduces the number of features that represent documents, is an absolute requirement in text categorization. In this paper, we proposed a feature selection method, which improved the(More)
The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was(More)
Text categorization is an important means to process automatically the information which increases exponentially. But due to the high dimensionality of the text corpus, many sophisticated classifiers can not be efficiently and effectively used in text categorization. So feature selection has become a research focus in text categorization. In this paper, we(More)
Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively measures the degree of relevance and distinction of terms occur in document set. We evaluated AD on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two(More)
Text categorization is very vital in assisting people to process automatically the information which increases exponentially. But the high dimensionality of the vector space is a big hurdle in applying many sophisticated learning algorithms in text categorization. So feature selection has become a research focus in text categorization. In this paper, we(More)