Mixed feature selection based on granulation and approximation
As the credit industry has been growing rapidly, huge number of consumers’ credit data are collected by the credit department of the bank and credit scoring has become a very important issue. Usually, a large amount of redundant information and features are involved in the credit dataset, which leads to lower accuracy and higher complexity of the credit scoring model, so, effective feature selection methods are necessary for credit dataset with huge number of features. This paper aims at comparing seven well-known feature selection methods for credit scoring. Which are t-test, principle component analysis (PCA), factor analysis (FA), stepwise regression, Rough Set (RS), Classification and regression tree (CART) and Multivariate adaptive regression splines (MARS). Support vector machine (SVM) is used as the classification model. Two credit scoring databases are used in order to provide a reliable conclusion. Regarding the experimental results, the CART and MARS methods outperform the other methods by the overall accuracy and type I error and type II error.