Deep autoencoder neural networks for gene ontology annotation predictions

@article{Chicco2014DeepAN,
  title={Deep autoencoder neural networks for gene ontology annotation predictions},
  author={Davide Chicco and Peter Sadowski and Pierre Baldi},
  journal={Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics},
  year={2014}
}
  • D. Chicco, Peter Sadowski, P. Baldi
  • Published 20 September 2014
  • Computer Science
  • Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
The annotation of genomic information is a major challenge in biology and bioinformatics. Existing databases of known gene functions are incomplete and prone to errors, and the bimolecular experiments needed to improve these databases are slow and costly. While computational methods are not a substitute for experimental verification, they can help in two ways: algorithms can aid in the curation of gene annotations by automatically suggesting inaccuracies, and they can predict previously… 

Figures and Tables from this paper

Protein function prediction with gene ontology: from traditional to deep learning models

TLDR
This work reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach, and selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset.

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

TLDR
DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, is proposed as a solution to Gene Ontology based protein function prediction and the neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations.

Protein Function Prediction Using Deep Restricted Boltzmann Machines

TLDR
Deep restricted Boltzmann machines (DRBM) is investigated, a representative deep learning technique, to predict the missing functional annotations of partially annotated proteins, and it runs faster than these comparing methods.

Validation Pipeline for Computational Prediction of Genomics Annotations

TLDR
This work proposes a validation procedure based upon three different sub-phases, which is able to assess the precision of any algorithm predictions with a reliable degree of accuracy and shows some validation results obtained for Gene Ontology annotations of Homo sapiens genes that demonstrate the effectiveness of this approach.

Gene function finding through cross-organism ensemble learning

TLDR
GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO Annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied, is presented.

Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations

TLDR
A novelty indicator able to state the level of “originality” of the annotations predicted for a specific gene to Gene Ontology (GO) terms is proposed, joint with previously introduced prediction steps, that helps by prioritizing the most novel interesting annotations predicted by improving accuracy and relevance of an annotation prediction and prioritization pipeline.

Ontology-Based Prediction and Prioritization of Gene Functional Annotations

TLDR
A computational pipeline that uses different semantic and machine learning methods to predict novel ontology-based gene functional annotations is proposed and a new semantic prioritization rule is introduced to categorize the predicted annotations by their likelihood of being correct.

Gene Prediction Using Unsupervised Deep Networks

TLDR
This is the first proposal for an unsupervised signal sensor for gene prediction, and the first time Convolutional AutoEncoders have been used for features extraction on DNA sequences.

Validation Procedures for Predicted Gene Ontology Annotations

TLDR
This paper illustrates and compares three effective validation procedures that, together, are able to state the precision of any algorithm predictions with a reliable degree of accuracy and shows some validation results generated on Gene Ontology datasets of Homo sapiens gene annotations that prove the effectiveness of these techniques.

A Literature Review of Gene Function Prediction by Modeling Gene Ontology

TLDR
It is concluded that there remain many largely overlooked but important topics for future research in gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities.
...

References

SHOWING 1-10 OF 30 REFERENCES

Semantically improved genome-wide prediction of Gene Ontology annotations

TLDR
A novel prediction algorithm that incorporates gene clustering based on gene functional similarity computed on Gene Ontology annotations and tested both prediction methods performing k-fold cross-validation on two organism genomes.

Probabilistic Latent Semantic Analysis for prediction of Gene Ontology annotations

TLDR
The effectiveness of the pLSAnorm prediction method is proved by performing k-fold cross-validation of the GO annotations of two organisms, Gallus gallus and Bos taurus, by using a modified Probabilistic Latent Semantic Analysis algorithm.

Predicting gene function from patterns of annotation.

TLDR
The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases, and the relationships among GO attributes with decision trees and Bayesian networks are modeled.

Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis

TLDR
A technique is described that improves the previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions by using a vector space model and a number of weighting schemes in addition to the previous latent semantic indexing approach.

Improved Biomolecular Annotation Prediction through Weighting Scheme Methods

TLDR
This paper proposes an improvement of two annotation prediction methods, which are based on vectorial semantic analysis, which relies on a preparatory weighting of the set of considered annotations, and demonstrates effectiveness of the weighting approach.

Protein Function Prediction with Incomplete Annotations

TLDR
This work proposes a Protein Function Prediction method with Weak-label Learning (ProWL) and its variant ProWL-IF, which can replenish the missing functions of proteins and makes use of the knowledge that a protein cannot have certain functions, to boost the performance of protein function prediction.

Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature.

TLDR
It is concluded that statistical methods may be used to assign GO codes and may be useful for the difficult task of reassignment as terminology standards evolve over time.

Latent Dirichlet Allocation based on Gibbs Sampling for gene function prediction

TLDR
Two variants of the known Latent Dirichlet Allocation algorithm applied to the prediction of gene annotations are proposed, using the collapsed Gibbs Sampling method during the training phase and two distinct initialization approaches to adapt the LDA mathematical model to the biomolecular annotation scenario.

A semantic analysis of the annotations of the human genome

TLDR
The technique is able to identify missing and inaccurate annotations in existing annotation databases, and thus help improve their accuracy, and is used to analyze and improve the quality of the data of any public or private annotation database.

Semantic Analysis of Genome Annotations using Weighting Schemes

TLDR
A technique is described that improves a previous method for extracting implicit semantic relationships between genes and functions by adding a number of weighting schemes to the previous latent semantic indexing approach and is used to analyze the current annotations of the human genome.