Development of biomarker classifiers from high-dimensional data

  title={Development of biomarker classifiers from high-dimensional data},
  author={Songjoon Baek and Chen-An Tsai and James J. Chen},
  journal={Briefings in bioinformatics},
  volume={10 5},
Recent development of high-throughput technology has accelerated interest in the development of molecular biomarker classifiers for safety assessment, disease diagnostics and prognostics, and prediction of response for patient assignment. This article reviews and evaluates some important aspects and key issues in the development of biomarker classifiers. Development of a biomarker classifier for high-throughput data involves two components: (i) model building and (ii) performance assessment… 

Tables from this paper

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics*

It is concluded that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes, but their performance improves markedly with increasing sample size up to a point at which they outperform the other methods.

Biomarker adaptive designs in clinical trials

This review is concerned with statistical aspects in the biomarker adaptive design for randomized clinical trials, which include the interaction test to identify predictive biomarkers, subgroup analysis, multiple testing and false discovery rate (FDR), classification of imbalanced class size data, sample size and power, and validation of the classification model.

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

This work explores the identification of biomarker genes in 12 types of cancers from the classification effects in control and disease samples by machine learning and finds extreme learning machine achieves the best classification performance when compared to the other methods.

The Marker State Space (MSS) Method for Classifying Clinical Samples

A new strategy for developing biomarker panels that accounts for completely distinct patient subclasses is presented, using Marker State Space (MSS) for modeling highly divergent subclasses of patients, which may be adaptable for diverse applications.

A Jackknife and Voting Classifier Approach to Feature Selection and Classification

Voting classifiers in combination with a robust feature selection method such as the jackknife procedure offer an effective, simple and intuitive approach to feature selection and classification with a clear extension to clinical applications.

Consistency of predictive signature genes and classifiers.

The authors proposed a transferability index (T-index), which is a combined measure of prediction accuracy and consistency between two platforms, which assesses the crossplatform consistency of the signature genes and classifier.

Prognostic and predictive signatures for treatment decisions.

Simulation experiment showed that proposed models for identifying the biomarker sets S and U performed well, as did the procedure C(S,U) for subgroup identification.

Subgroup identification for treatment selection in biomarker adaptive design

The proposed DLDA-based classifier performs well in terms of sensitivity, specificity, positive and negative predictive values, and accuracy in the simulation data and the two cancer datasets, with superior accuracy compared to the ASD method.

Cross-study validation for the assessment of prediction algorithms

This work develops and implements a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets, and suggests that standard cross- validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross- study validation.

Assessment of performance of survival prediction models for cancer prognosis

Different performance metrics for evaluation of a survival prediction model may give different conclusions in its discriminatory ability, and the cross-validated power of survival prediction models decreases as the training and test sets become less balanced.



Identifying High-Dimensional Biomarkers for Personalized Medicine via Variable Importance Ranking

A classification algorithm with the proposed ranking method is shown to be competitive with other selection methods for discovering genomic biomarkers underlying both adverse and efficacious outcomes for improving individualized treatment of patients for life-threatening diseases.

Development and Validation of Biomarker Classifiers for Treatment Selection.

  • R. Simon
  • Biology, Medicine
    Journal of statistical planning and inference
  • 2008

A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

It is indicated that multiclass classification problem is much more difficult than the binary one for the gene expression datasets, due to the fact that the data are of high dimensionality and that the sample size is small.

Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data

Different discrimination methods for the classification of tumors based on gene expression data include nearest-neighbor classifiers, linear discriminant analysis, and classification trees, which are applied to datasets from three recently published cancer gene expression studies.

New algorithms for multi-class cancer diagnosis using tumor gene expression signatures

A classification algorithm where each tissue sample is considered as the center of a cluster which is a ball, which performs slightly better than the published results for multi-class classifiers based on support vector machines for this data set.