Robust hybrid wrapper/filter biomarker discovery from gene expression data based on generalised island model
Feature ranking, which ranks features via their individual importance, is one of the frequently used feature selection techniques. Traditional feature ranking criteria are apt to produce inconsistent ranking results even with light perturbations in training samples when applied to high dimensional and small-sized gene expression data, which brings troubles for further studies such as biomarker identification. A widely used strategy for solving the inconsistencies is the multicriterion combination, where score normalisation is crucial. In this paper, three problems in existing methods are first analyzed, and then a new feature importance transformation algorithm based on resampling and permutation is proposed for score normalisation. Experimental studies on four popular gene expression data sets show that the multi-criterion combination based on the proposed score normalisation produces gene rankings with improved robustness.