iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space
The combination of the network theory and the calculation of topological indices (TIs) allow establishing relationships between the molecular structure of large molecules like the genes and proteins and their properties at a biological level. This type of models can be considered quantitative structure-activity relationships (QSAR) for biopolymers. In the present work a QSAR model is reported for proteins, related to human colorectal cancer (HCC) and codified by different genes that have been identified experimentally by Sjöblom et al. [2006. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268-274] among more than 10000 human genes. The 69 proteins related to human colorectal cancer (HCCp) and a control group of 200 proteins not related to HCC (no-HCCp) were represented through an HP Lattice type Network. Starting from the generated graphs we calculate a set of descriptors of electrostatic potential type (xi(k)) that allow to establish, through a linear discriminant analysis (LDA), a QSAR model of relatively high percentage of good classification (higher than 80%) to differentiate between HCCp and no-HCCp proteins. The purpose of this study is helping to predict the possible implication of a certain gene and/or protein (biomarker) in the colorectal cancer. Different procedures of validation of the obtained model have been carried out in order to corroborate its stability, including cross-validation series (CV) and evaluation of an additional series of 200 no-HCCp. This biostatistic methodology could be applied to predict human colorectal cancer biomarkers and to understand much better the biological aspects of this disease.