Estimation of distribution algorithms as logistic regression regularizers of microarray classifiers.

@article{Bielza2009EstimationOD,
  title={Estimation of distribution algorithms as logistic regression regularizers of microarray classifiers.},
  author={Concha Bielza and V{\'i}ctor Robles and Pedro Larra{\~n}aga},
  journal={Methods of information in medicine},
  year={2009},
  volume={48 3},
  pages={
          236-41
        }
}
OBJECTIVES The "large k (genes), small N (samples)" phenomenon complicates the problem of microarray classification with logistic regression. The indeterminacy of the maximum likelihood solutions, multicollinearity of predictor variables and data over-fitting cause unstable parameter estimates. Moreover, computational problems arise due to the large number of predictor (genes) variables. Regularized logistic regression excels as a solution. However, the difficulties found here involve an… 

Figures and Tables from this paper

Chapter 6 Estimation of Distribution Algorithms in Gene Expression Data Analysis
TLDR
This chapter provides an overview of different existing EDAs and then review some of their application in bioinformatics and finally it discusses a specific problem that have been solved with this method in more details.
egularized continuous estimation of distribution algorithms
TLDR
The results show that the optimization performance of the proposed RegEDAs is less affected by the increase in the problem size than other EDAs, and they are able to obtain significantly better optimization values for many of the functions in high-dimensional igh-dimensionality settings.
Estimation of Distribution Algorithms in Gene Expression Data Analysis
TLDR
This chapter provides an overview of different existing EDAs and then review some of their application in bioinformatics and finally it discusses a specific problem that have been solved with this method in more details.
A review of estimation of distribution algorithms in bioinformatics
TLDR
A basic taxonomy of EDA techniques is set out, underlining the nature and complexity of the probabilistic model of each EDA variant, and emphasizing the EDA paradigm's potential for further research in this domain.
Scaling Up Estimation of Distribution Algorithms for Continuous Optimization
TLDR
EDA-MCC is the first successful instance of multivariate probabilistic model-based EDAs that can be effectively applied to a general class of up to 500-D problems and outperforms some newly developed algorithms designed specifically for large-scale optimization.
A latent space-based estimation of distribution algorithm for large-scale global optimization
TLDR
A latent space-based EDA (LS-EDA), which transforms the multivariate probabilistic model of Gaussian- based EDA into its principal component latent subspace with lower dimensionality, and outperforms the others on the benchmark functions with overlap and nonseparate variables.
Identification of biomarkers that distinguish chemical contaminants based on gene expression profiles
TLDR
A new feature selection algorithm called gradient method was developed that had a relatively high training classification as well as prediction accuracy with the lowest overfitting rate of the methods tested.
On novel approaches for classification. A proposal for an interdisciplinary debate.
  • A. Ziegler
  • Computer Science
    Methods of information in medicine
  • 2010
TLDR
Standard statistics can be used to judge whether a novel classification scheme performs significantly better than the standard classifier, and if two different classification schemes are applied to the same data set, each subject can be judged to be correctly classified by each of the two classifiers.
Biomedical Data Mining
TLDR
The special topic of Methods of Information in Medicine on data mining in biomedicine is introduced, with selected papers from two workshops on Intelligent Data Analysis in bioMedicine (IDAMAP) held in Verona and Amsterdam.
...
...

References

SHOWING 1-10 OF 42 REFERENCES
Classification of microarray data with penalized logistic regression
TLDR
penalized logistic regression performs well on a public data set (the MIT ALL/AML data) and is optimized with AIC (Akaike's Information Criterion), which essentially is a measure of prediction performance.
Classification using partial least squares with penalized logistic regression
TLDR
A new method combining partial least squares (PLS) and Ridge penalized logistic regression is proposed and the predictive performance of the resulting classification rule is illustrated on three data sets: Leukemia, Colon and Prostate.
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization
TLDR
A simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior, and the improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step.
Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms
TLDR
This work presents a novel approach for fitting the logistic regression model based on estimation of distribution algorithms (EDAs), a tool for evolutionary computation from a double perspective: likelihood- based to calibrate the model and AUC-based to discriminate between the different classes.
Classification of gene microarrays by penalized logistic regression.
Classification of patient samples is an important aspect of cancer diagnosis and treatment. The support vector machine (SVM) has been successfully applied to microarray cancer diagnosis problems.
Entropy-based gene ranking without selection bias for the predictive classification of microarray data
TLDR
A process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance as well as improving on alternative parametric RFE reduction strategies.
Regularized ROC method for disease classification and biomarker selection with microarray data
TLDR
The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection and yields parsimonious models with excellent classification performance.
Sparse multinomial logistic regression: fast algorithms and generalization bounds
TLDR
This paper introduces a true multiclass formulation based on multinomial logistic regression and derives fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces.
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
TLDR
This paper describes an efficient interior-point method for solving large-scale l1-regularized logistic regression problems, and shows how a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
...
...