Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification

@article{Esfahani2014IncorporationOB,
  title={Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification},
  author={Mohammad Shahrokh Esfahani and Edward R. Dougherty},
  journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
  year={2014},
  volume={11},
  pages={202-218}
}
  • M. S. Esfahani, E. Dougherty
  • Published 2014
  • Mathematics, Computer Science, Medicine
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics
Small samples are commonplace in genomic/proteomic classification, the result being inadequate classifier design and poor error estimation. The problem has recently been addressed by utilizing prior knowledge in the form of a prior distribution on an uncertainty class of feature-label distributions. A critical issue remains: how to incorporate biological knowledge into the prior distribution. For genomics/proteomics, the most common kind of knowledge is in the form of signaling pathways. Thus… 
Sample-based prior probability construction using biological pathway knowledge
TLDR
This paper addresses the problem of prior probability construction by proposing a series of optimization paradigms that utilize the incomplete prior information contained in pathways in the special case of a Normal-Wishart prior distribution on the mean and inverse covariance matrix of a Gaussian distribution.
Constructing Pathway-Based Priors within a Gaussian Mixture Model for Bayesian Regression and Classification
TLDR
Simulations demonstrate that the GMM REMLP prior yields better performance than the EM algorithm for small data sets, and is applied to phenotype classification when the prior knowledge consists of colon cancer pathways.
Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors
TLDR
The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists, and enables superior classifier design using small, unstructured data sets.
An Optimization-Based Framework for the Transformation of Incomplete Biological Knowledge into a Probabilistic Structure and Its Application to the Utilization of Gene/Protein Signaling Pathways in Discrete Phenotype Classification
TLDR
This paper provides a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design and proposes optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting.
MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification
TLDR
A hierarchical multivariate Poisson model and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data and demonstrates superior classification performance for both synthetic and real RNA-Seq datasets.
Incorporating prior knowledge induced from stochastic differential equations in the classification of stochastic observations
TLDR
This paper considers prior knowledge in the form of stochastic differential equations (SDEs) in integral form involving a drift vector and dispersion matrix and develops the optimal Bayesian classifier between two models.
Data Requirements for Model-Based Cancer Prognosis Prediction
TLDR
It is shown that static data can be superior for prognosis prediction when constrained to small samples, and performance is not sensitive to inaccuracies in the estimated network probabilities.
Intrinsically Bayesian robust classifier for single-cell gene expression trajectories in gene regulatory networks
TLDR
This paper studies expression-based classification under the assumption that single-cell measurements are sampled at a sufficient rate to detect regulatory timing, and derives the intrinsically Bayesian robust classifier to discriminate between wild-type and mutated networks based on expression trajectories.
Discrete optimal Bayesian classification with error-conditioned sequential sampling
TLDR
This paper proposes to forego random sampling and utilize the prior knowledge and previously collected data to determine which class to sample from at each step of the sampling, and shows robustness even in case when prior knowledge drifts away from true distributions.
Optimal Bayesian Filtering for Biomarker Discovery: Performance and Robustness
TLDR
The utility of OBF in biomarker discovery using acute myeloid leukemia (AML) and colon cancer microarray datasets is evaluated, and it is shown that OBF is successful at identifying well-known biomarkers for these diseases that rank low under moderated t-test.
...
1
2
3
4
...

References

SHOWING 1-10 OF 60 REFERENCES
Classifier design given an uncertainty class of feature distributions via regularized maximum likelihood and the incorporation of biological pathway knowledge in steady-state phenotype classification
TLDR
This paper derives approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely-used models for the uncertainty classes; ε-contamination and p-point classes.
The Illusion of Distribution-Free Small-Sample Classification in Genomics
TLDR
Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion.
Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity
TLDR
A new classification method based on probabilistic inference of pathway activities is proposed for the classification of breast cancer metastasis, and it is shown that it achieves higher accuracy and identifies more reproducible pathway markers compared to several existing pathway activity inference methods.
Optimal classifiers with minimum expected error within a Bayesian framework - Part I: Discrete and Gaussian models
TLDR
This paper derives optimal classifiers in discrete and Gaussian models, demonstrates their superior performance over popular classifiers within the assumed model, and applies the method to real genomic data.
Identification of Robust Pathway Markers for Cancer through Rank-Based Pathway Activity Inference
TLDR
Simulation results based on multiple breast cancer datasets show that the proposed inference method identifies better pathway markers that can predict breast cancer metastasis with higher accuracy and can lead to better classifiers with more consistent classification performance across independent datasets.
Bayesian Minimum Mean-Square Error Estimation for Classification Error—Part I: Definition and the Bayesian MMSE Error Estimator for Discrete Classification
TLDR
This investigation places classifier error estimation into the framework of optimal mean-square error (MSE) signal estimation in the presence of uncertainty, which results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions with the prior distribution of the parameters governing the choice offeature-label distribution.
Inferring Pathway Activity toward Precise Disease Classification
TLDR
It is shown that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer.
Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks
TLDR
Probabilistic Boolean Networks (PBN) are introduced that share the appealing rule-based properties of Boolean networks, but are robust in the face of uncertainty.
Conditioning-Based Modeling of Contextual Genomic Regulation
TLDR
A coarse mathematical model of the propagation of regulatory influence in such distributed, context-sensitive regulatory networks is constructed that allows a quantitative estimation of the amount of crosstalk and conditioning associated with a candidate regulatory gene taken from a set of genes that have been profiled over a series of samples where the candidate' s activity varies.
Signal transduction pathway profiling of individual tumor samples
TLDR
This study shows that it is feasible to infer signal transduction pathway activity, in individual samples, from gene expression data, and these pathway activities are biologically relevant in the three cancer data sets.
...
1
2
3
4
5
...