Gene Selection for Cancer Classification using Support Vector Machines

  title={Gene Selection for Cancer Classification using Support Vector Machines},
  author={Isabelle Guyon and Jason Weston and Stephen D. Barnhill and Vladimir Naumovich Vapnik},
  journal={Machine Learning},
DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. [] Key Method We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer.In contrast with the baseline method…
Accurate molecular classification of cancer using simple rules
In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect, which means that very simple rules may perform well for cancerous class prediction.
A greedy algorithm for gene selection based on SVM and correlation
A novel algorithm for gene selection that combines Support Vector Machines (SVMs) with gene correlations is presented, called GCI-SVM, which obtains a higher classification accuracy using a smaller number of selected genes than the well-known algorithms in the literature.
A hybrid approach for gene selection and classification using support vector machine
An ensemble feature selection technique which is a combination of Recursive Feature Elimination (RFE) and Based Bayes error Filter (BBF) for gene selection and Support Vector Machine (SVM) algorithm for classification is proposed.
Review on Feature Selection Techniques and the Impact of SVM for Cancer Classification using Gene Expression Profile
A review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification is presented.
Applying Data Mining Techniques for Cancer Classification from Gene Expression Data
  • J. Yeh
  • Computer Science
    2007 International Conference on Convergence Information Technology (ICCIT 2007)
  • 2007
Genetic algorithms (GA) with an initial solution provided by t- statistics (t-GA) for selecting a group of relevant genes from cancer microarray data are applied and the decision tree based cancer classifier is built on top of these selected genes.
Gene signature selection for cancer prediction using an integrated approach of genetic algorithm and support vector machine
An integrated approach of support vector machine (SVM) and genetic algorithm (GA) is proposed that can simultaneously optimize the feature subset and the classifier through a common solution coding mechanism and outperforms other existing methods in terms of classification accuracy.
Machine learning models for lung cancer classification using array comparative genomic hybridization
It is concluded that gene copy numbers as measured by array CGH are, collectively, an excellent indicator of histological subtype.
A Scheme for Feature Selection from Gene Expression Data using Recursive Feature Elimination with Cross Validation and Unsupervised Deep Belief Network Classifier
  • Nimrita KoulS. Manvi
  • Computer Science
    2019 3rd International Conference on Computing and Communications Technologies (ICCCT)
  • 2019
A two level scheme for feature selection and classification of cancers where genes are ranked using Recursive Feature Elimination and later these genes are used to pre-train an Unsupervised Deep Belief Network Classifier to classify the samples based on the selected genes.
A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification
A new hybrid filter-wrapper gene subset selection algorithm that is an improved modification of the authors' prior algorithm that consistently outperforms prior gene selection algorithms in terms of classification accuracy, while requiring a small number of selected genes.


Support vector machine classification and validation of cancer tissue samples using microarray expression data
A new method to analyse tissue samples using support vector machines for mis-labeled or questionable tissue results and shows that other machine learning methods also perform comparably to the SVM on many of those datasets.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.
  • U. AlonN. Barkai A. Levine
  • Biology
    Proceedings of the National Academy of Sciences of the United States of America
  • 1999
A two-way clustering method is reported for analyzing a data set consisting of the expression patterns of different cell types, revealing broad coherent patterns that suggest a high degree of organization underlying gene expression in these tissues.
Knowledge-based analysis of microarray gene expression data by using support vector machines.
A method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments, based on the theory of support vector machines (SVMs), to predict functional roles for uncharacterized yeast ORFs based on their expression data is introduced.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Cluster analysis and display of genome-wide expression patterns
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding that the standard correlation coefficient conforms well to the intuitive biological notion of what it means for two genes to be ‘‘coexpressed’’.
Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.
The results support the feasibility and usefulness of this systematic approach to studying variation in gene expression patterns in human cancers as a means to dissect and classify solid tumors.
Gene functional classification from heterogeneous data
This work considers the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons and proposes an SVM kernel function that is explicitly heterogeneous.
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
It is shown that there is diversity in gene expression among the tumours of DLBCL patients, apparently reflecting the variation in tumour proliferation rate, host response and differentiation state of the tumour.
Altered expression of heterogenous nuclear ribonucleoproteins and SR factors in human colon adenocarcinomas.
This analysis investigates the alternative splicing pattern of the CD44 gene in specimens of nonfamilial colon adenocarcinomas at different stages of tumor progression and finds that the mRNA levels of different SR proteins in tumor specimens are different from, and usually lower than, those detected in samples of nonpathological tissue adjacent to the tumor.
An Experimental and Theoretical Comparison of Model Selection Methods
A detailed comparison of three well-known model selection methods — a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV) are compared.