DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

@article{Kulmanov2017DeepGOPP,
  title={DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier},
  author={Maxat Kulmanov and Mohammed Asif Khan and R. Hoehndorf},
  journal={Bioinformatics},
  year={2017},
  volume={34},
  pages={660 - 668}
}
Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. [] Key Method The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features…

Figures and Tables from this paper

SDN2GO: An Integrated Deep Learning Model for Protein Function Prediction

An integrated deep-learning-based classification model, named SDN2GO, to predict protein functions, which outperforms others on each sub-ontology of GO and learns from the Natural Language Processing to process domain information and pre-trained a deep learning sub-model to extract the comprehensive features of domains.

DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions

A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence‐ and network‐derived information and outperforms current methods on the testing dataset and on the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset.

A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information

  • Fuhao ZhangHong Song Min Li
  • Computer Science, Biology
    IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • 2021
A deep learning framework to predict protein functions with protein sequences and protein-protein interaction (PPI) networks is proposed and the experimental results show that DeepGOA outperforms DeepGO and BLAST.

Protein function prediction with gene ontology: from traditional to deep learning models

This work reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach, and selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset.

Gene Ontology based protein functional annotation using pretrained embeddings

The experiment showed that protein embeddings created using pretrained transformer models can be used as a source of data for tasks involving sequence prediction, with a focus on protein functions.

Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction

A novel deep-learning method to predict Gene Ontology attributes of proteins through a triplet neural-network architecture embedded with pre-trained language models from protein sequences, demonstrating a new avenue for high-accuracy deep- learning function prediction that is applicable to large-scale protein function annotations from sequence alone.

PANDA2: protein function prediction using graph neural networks

A deep learning system named PANDA2 was developed to predict protein functions, which used the cutting-edge graph neural network to model the topology of the GO DAG and integrated the features generated by transformer protein language models.

A Deep Learning Framework for Predicting Protein Functions With Co-Occurrence of GO Terms

This work proposes a new deep learning model, named DeepPFP-CO, which uses Graph Convolutional Network (GCN) to explore and capture the co-occurrence of GO terms to improve the protein function prediction performance.

DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms

DeepGOZero is developed, a machine learning model which improves predictions for functions with no or only a small number of annotations and can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function.
...

FFPred 3: feature-based function prediction for all Gene Ontology domains

This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation.

CombFunc: predicting protein function using heterogeneous data sources

The CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions, is presented, which incorporates ConFunc, the existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein–protein interaction data.

Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method

  • Artem SokolovA. Ben-Hur
  • Computer Science
    J. Bioinform. Comput. Biol.
  • 2010
This work proposes a method that directly predicts a full functional annotation of a protein by modeling the structure of the Gene Ontology hierarchy in the framework of kernel methods for structured-output spaces.

Roles for text mining in protein function prediction.

This chapter introduces two main strategies for association of function terms, represented as Gene Ontology terms, to proteins based on information in published articles, and a paradigm called LEAP-FS (Literature-Enhanced Automated Prediction of Functional Sites) in which literature mining is used to validate the predictions of an orthogonal computational protein function prediction method.

STRING v10: protein–protein interaction networks, integrated over the tree of life

H hierarchical and self-consistent orthology annotations are introduced for all interacting proteins, grouping the proteins into families at various levels of phylogenetic resolution in the STRING database.

Information-theoretic evaluation of predicted ontological annotations

An information-theoretic framework is proposed that uses a Bayesian network, structured according to the underlying ontology, to model the prior probability of a protein’s function and proposes a single statistic, referred to as semantic distance, that can be used to rank classification models.

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

A domain- based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies, FunFHMMer, which generates more functionally coherent groupings of protein sequences than other domain-based protein classifications.

Predicting Protein Function by Multi-Label Correlated Semi-Supervised Learning

  • J. Q. JiangL. McQuay
  • Computer Science
    IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • 2012
This work proposes a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

A new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks that greatly outperforms existing methods and leads to much more accurate contact-assisted folding.