An Improved Method of Feature Selection Based on Concept Attributes in Text Classification

@inproceedings{Liao2005AnIM,
  title={An Improved Method of Feature Selection Based on Concept Attributes in Text Classification},
  author={Shasha Liao and Minghu Jiang},
  booktitle={ICNC},
  year={2005}
}
The feature selection and weighting are two important parts of automatic text classification. In this paper we give a new method based on concept attributes. We use the DEF Terms of the Chinese word to extract concept attributes, and a Concept Tree (C-Tree) to give these attributes proper weighs considering their positions in the C-Tree, as this information describe the expression powers of the attributes. If these attributes are too weak to sustain the main meanings of the words, they will be… 

A NEW FEATURE SELECTION METHOD BASED ON CONCEPT EXTRACTION IN AUTOMATIC CHINESE TEXT CLASSIFICATION

TLDR
This paper uses a Chinese semantic dictionary — Hownet to extract the concepts from the word as the feature set, because it can better reflect the meaning of the text.

A RBF Network for Chinese Text Classification Based on Concept Feature Extraction

TLDR
A Chinese semantic dictionary -- Hownet is used to extract the concepts from the word as the feature set and a combined feature set that consists of both sememes and the Chinese words is constructed and a CHI-MCOR weighing method is proposed according to the weighing theories and classification precision.

A Text Feature Selection Algorithm Based on Improved TFIDF

TLDR
This paper analyzed the TFIDF feature selection algorithm deeply, and proposed a new TFIDf feature selection method based on Gini index theory, which shows the method is valid in improving the accuracy of text categorization.

Concept Features Extraction and Text Clustering Analysis of Neural Networks Based on Cognitive Mechanism

TLDR
This paper uses the HowNet to extract the concept attributes, and proposes CHI-MCOR method to build a feature set, which not only selects the highly occurring words, but also selects the word whose occurrence frequency is middle or low occurring words that are important for text classification.

Automatic Chinese text categorization system based on mutual information

TLDR
An improved feature selection method is proposed by calculating the absolute values of MI and calculating the differential values between maximum and average of MI by analyzing the formula of MI theoretically and systematically.

Text Categorization Using Distributional Clustering and Concept Extraction

TLDR
This paper tries to solve the problem of TC through a 2-step feature selection approach, which maintains the generalization ability of concept extraction based TC and makes full use of the occurrences of new words that are not found in concept thesaurus.

Document representation combining concepts and words in Chinese text categorization

  • Chao CheH. Teng
  • Computer Science
    2009 International Conference on Natural Language Processing and Knowledge Engineering
  • 2009
TLDR
This paper investigates a document representation combining words and concepts to integrate the advantages of two types of representations, and takes the part of speech as the concept for the word which is error-prone in word sense disambiguation to reduce the disambiguous mistakes.

A Comparison between the BP and RBF Networks

TLDR
Experimental results show that BP and RBF network outperform Competitive network because of the application of supervised learning and the RBFnetwork deserves more attention in the use of text classification.

EXTRACTION OF BIOMEDICAL INFORMATION FROM MEDLINE DOCUMENTS –A TEXT MINING APPRAOCH

TLDR
A novel text mining algorithm is proposed to enhance the performance of the information retrieval system and shows that it improves the accuracy and precision of the Information retrieval.

References

SHOWING 1-8 OF 8 REFERENCES

A Comparative Study on Feature Selection in Text Categorization

TLDR
This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.

Machine learning in automated text categorization

TLDR
This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

TLDR
This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.

Zhao Mingsheng the feature selection in Text classification

  • An introduction to support Vector Machines and Other Kernel - based learning Methods Nello Cristianin of Cambridge , England , 2000 4 . FABRIZIO SEBASTIANI machine learning in automated text categorization ACM computing surveys

Heyan, the comparative of feature selection in Chinese text classification

  • Journal of Chinese Information Processing. VOL 1,NO18,pp.26-32,
  • 2004

Shuying, the improved text classification method based on weighting adjusting

  • Tsinghua Science and Technology,
  • 2003

Mingsheng the feature selection in Text classification

  • Journal of Chinese Information Processing. VOL18,
  • 2004