Empirical Normalization for Quadratic Discriminant Analysis and Classifying Cancer Subtypes

  title={Empirical Normalization for Quadratic Discriminant Analysis and Classifying Cancer Subtypes},
  author={Mark A. Kon and Nikolay Nikolaev},
  journal={2011 10th International Conference on Machine Learning and Applications and Workshops},
  • M. Kon, Nikolay Nikolaev
  • Published 18 December 2011
  • Computer Science
  • 2011 10th International Conference on Machine Learning and Applications and Workshops
We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the training and test data into new data with components having Gaussian empirical distributions. This map is an empirical version of the Gaussian copula used in probability and mathematical finance. The purpose is to form a feature mapped dataset as close as possible… 
5 Citations

Tables from this paper

PCA-QDA Model Selection for Detecting NS1 Related Diseases from SERS Spectra of Salivary Mixtures
This work intends to define an optimal classifier model for Quadratic Discriminant Analysis (QDA), optimized with Principal Component Analysis (PCA), to distinct between positive and negative NS1 adulterated samples from salivary SERS spectra.
Human activity recognition: classifier performance evaluation on multiple datasets
Evaluating the performance of both classic and less commonly known classifiers with application to three distinct human activity recognition datasets freely available in the UCI Machine Learning Repository shows that even under heavy restrictions, it is possible to achieve classification accuracy of up to 98.16 %.
Using Copulas for Bayesian Meta-analysis
A Bayesian model is proposed for meta-analysis of treatment effectiveness data which are generally discrete Binomial and sparse and a bivariate class of priors is imposed to accommodate a wide range of heterogeneity between the multicenter clinical trials involved in the study.


Simple decision rules for classifying human cancers from gene expression profiles
The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes) and is easy to interpret as the classifier involves only a small number of informative genes.
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning and appears to be a better feature selector than Fisher and RFE in some of the cancer datasets.
Microarray Classification from Several Two-Gene Expression Comparisons
The design of the k-TSP (k-disjoint Top Scoring Pairs) classifier is motivated by the special scenario encountered in molecular cancer classification based on the mRNA concentrations provided by gene microarray data and only depends on expression comparisons among selected pairs of genes.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Classifying Gene Expression Profiles from Pairwise mRNA Comparisons
The TSP classifier achieves prediction rates with standard cancer data that are as high as those of previous studies which use considerably more genes and complex procedures and is parameter-free, thus avoiding the type of over-fitting and inflated estimates of performance that result when all aspects of learning a predictor are not properly cross-validated.
Discriminant Analysis and Statistical Pattern Recognition
Provides a systematic account of the subject area, concentrating on the most recent advances in the field. While the focus is on practical considerations, both theoretical and practical issues are
Distribution modeling and simulation of gene expression data
Expression profiling predicts outcome in breast cancer
The predictive capacity of the prognosis classifier cannot be explained by its association with, among other factors, ER status as suggested, and the predictive power of the authors' prognosis reporters may be reduced in an adjuvantly treated patient group.