Corpus ID: 211010608

On the Consistency of Optimal Bayesian Feature Selection in the Presence of Correlations

  title={On the Consistency of Optimal Bayesian Feature Selection in the Presence of Correlations},
  author={Ali Foroughi pour and Lori A. Dalton},
Optimal Bayesian feature selection (OBFS) is a multivariate supervised screening method designed from the ground up for biomarker discovery. In this work, we prove that Gaussian OBFS is strongly consistent under mild conditions, and provide rates of convergence for key posteriors in the framework. These results are of enormous importance, since they identify precisely what features are selected by OBFS asymptotically, characterize the relative rates of convergence for posteriors on different… Expand

Figures and Topics from this paper


Theory of Optimal Bayesian Feature Filtering
This result provides conditions where OBF is guaranteed to identify the correct feature set given enough data, and it justifies the use of OBF in non-design settings where its assumptions are invalid. Expand
Optimal Bayesian feature selection
  • Lori A. Dalton
  • Computer Science
  • 2013 IEEE Global Conference on Signal and Information Processing
  • 2013
This work begins to address optimal feature selection in a Bayesian framework via a sparsity inducing prior that assumes the number of “good” features is small and derives expressions for the sample-conditioned probability mass over good feature sets. Expand
Optimal Bayesian feature selection on high dimensional gene expression data
This work proposes two suboptimal feature selection algorithms based on optimal Bayesian feature selection theory that perform very well with relatively low computational burden, thus being ideal for molecular biomarker discovery. Expand
What should be expected from feature selection in small-sample settings
These questions are addressed using three classification rules (linear discriminant analysis, linear support vector machine and k-nearest-neighbor classification) and feature selection via sequential floating forward search and the t-test and it is concluded that one cannot expect to find a feature set whose error is close to optimal. Expand
The peaking phenomenon in the presence of feature-selection
It can be concluded that one should be wary of applying peaking results found in the absence of feature-selection to settings in which feature- selection is employed, and used massive simulation in a high-performance computing environment to produce a large library of error versus feature size curves. Expand
A review of the stability of feature selection techniques for bioinformatics data
The role of stability in feature selection with DNA microarray data is introduced, various ways of improving feature ranking stability are listed, and feature selection techniques are discussed, specifically explaining ensemble feature ranking and presenting various ensemble featureranking aggregation methods. Expand
Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection
The basic taxonomy of feature selection is presented, and the state-of-the-art gene selection methods are reviewed by grouping the literatures into three categories: supervised, unsupervised, and semi-supervised. Expand
Feature Selection
This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based. Expand
Applications of protein microarrays for biomarker discovery
The application of protein microarray technologies that offer unique opportunities to find novel biomarkers are discussed. Expand
Cancer biomarkers: can we turn recent failures into success?
  • E. Diamandis
  • Medicine
  • Journal of the National Cancer Institute
  • 2010
In this commentary, a plethora of parameters before sampleAnalysis, during sample analysis, and after sample analysis that can complicate biomarker discovery and validation and lead to "false discovery" are discussed. Expand