ProbCD: enrichment analysis accounting for categorization uncertainty

@article{Vncio2007ProbCDEA,
  title={ProbCD: enrichment analysis accounting for categorization uncertainty},
  author={Ricardo Z. N. V{\^e}ncio and Ilya Shmulevich},
  journal={BMC Bioinformatics},
  year={2007},
  volume={8},
  pages={383 - 383}
}
BackgroundAs in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this… 

ProbFAST: Probabilistic Functional Analysis System Tool

TLDR
A web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities and demonstrates its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes.

Generalized random set framework for functional enrichment analysis using primary genomics datasets

TLDR
A new statistical framework, generalized random set (GRS) analysis, is developed and validated for comparing the genomic signatures in two datasets without the need for gene categorization, and it showed dramatic improvement in the statistical power over other methods currently used in this setting.

Markov Chain Ontology Analysis (MCOA)

TLDR
A methodology based on Markov chain models and network analytic metrics can help detect the relevant signal within large, highly interdependent and noisy data sets and has been shown to generate superior performance on both real and simulated data relative to existing state-of-the-art approaches.

Optimization of gene set annotations via entropy minimization over variable clusters (EMVC)

TLDR
The proposed method, entropy minimization over variable clusters (EMVC), filters the annotations for each gene set to minimize a measure of entropy across disjoint gene clusters computed for a range of cluster sizes over multiple bootstrap resampled datasets.

Enrichment Analysis of Metabolic Pathways Using P-value Perturbation

TLDR
A new, easy to implement methodology for assessing the enrichment of biological pathways is proposed, which generalizes the applicability of enrichment analysis to other sources of omics data, based on a perturbed version of regular p-values.

LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data

TLDR
A logistic regression-based method for identifying predefined sets of biologically related genes enriched with (or depleted of) differentially expressed transcripts in microarray experiments that displayed robust behavior and improved statistical power compared with tested alternatives.

A general modular framework for gene set enrichment analysis

TLDR
This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods.

Gene expression LRpath : a logistic regression approach for identifying enriched biological groups in gene expression data

TLDR
A logistic regression-based method for identifying predefined sets of biologically related genes enriched with (or depleted of) differentially expressed transcripts in microarray experiments that displayed robust behavior and improved statistical power compared with tested alternatives.

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

TLDR
The survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.

Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis

TLDR
This study used the MAQC data sets for systematically accessing the intra- and inter-platform concordance of GO terms enriched by Gene Set Enrichment Analysis (GSEA) and LRpath, and demonstrated that the lists of DEGs in a high level of concordances can ensure the high concords of enrichment results.

References

SHOWING 1-10 OF 33 REFERENCES

BayGO: Bayesian analysis of ontology term enrichment in microarray data

TLDR
The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on and allows one to measure the statistical association between generic ontology terms and differential expression.

Enrichment or depletion of a GO category within a class of genes: which test?

TLDR
The relationships existing between these tests are clarified, in particular the equivalence between the hypergeometric test and Fisher's exact test, and the appropriateness of one- and two-sided P-values is discussed.

Protein classification using probabilistic chain graphs and the Gene Ontology structure

TLDR
Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms.

The Gaggle: An open-source software system for integrating bioinformatics software and data sources

TLDR
The Gaggle is described -a simple, open-source Java software environment that helps to solve the problem of software and database integration and identifies a putative ricin-like protein, made possible by simultaneous data exploration using a wide range of publicly available data and a variety of popular bioinformatics software tools.

Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

TLDR
An original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach and is more informative than either separate approach.

Analyzing gene expression data in terms of gene sets: methodological issues

TLDR
It is argued that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.

Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data

TLDR
Grouping Gene Ontology terms improves the interpretation of gene set enrichment for microarray data by finding groups of GO terms significantly over-represented amongst differentially expressed genes which are not found by Fisher's tests on individual GO terms.

Probabilistic annotation of protein sequences based on functional classifications

TLDR
In this framework, the Correspondence Indicators are defined as measures of relationship between sequence and function and two Bayesian approaches are formulated to estimate the probability for a sequence of unknown function to belong to a functional class.

GOLEM: an interactive graph-based gene-ontology navigation and analysis tool

TLDR
GOLEM (Gene Ontology Local Exploration Map), a visualization and analysis tool for focused exploration of the gene ontology graph, which allows the user to dynamically expand and focus the local graph structure of the Gene Ontology hierarchy in the neighborhood of any chosen term.

Extensions to gene set enrichment

TLDR
A well-defined procedure to address interpretation issues that can raise when gene sets have substantial overlap is provided and it is shown how standard dimension reduction methods, such as PCA, can be used to help further interpret GSEA.