A Survey of Computational Methods for Protein Function Prediction

@inproceedings{Shehu2016ASO,
  title={A Survey of Computational Methods for Protein Function Prediction},
  author={Amarda Shehu and Daniel Barbar{\'a} and Kevin Molloy},
  year={2016}
}
Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. [] Key Method Current methods predict function from a protein’s sequence, often in the context of evolutionary relationships, from a protein’s three-dimensional structure or specific patterns in the structure, from neighbors in a protein–protein interaction network, from microarray data, or a combination of these different types of data. Here we…
Protein function prediction with gene ontology: from traditional to deep learning models
TLDR
This work reviewed the currently available computational GO annotation methods for proteins, ranging from conventional to deep learning approach, and selected some suitable predictors from among the reviewed tools and conducted a mini comparison of their performance using a worldwide challenge dataset.
A Literature Review of Gene Function Prediction by Modeling Gene Ontology
TLDR
It is concluded that there remain many largely overlooked but important topics for future research in gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities.
Cross-Species Protein Function Prediction with Asynchronous-Random Walk
TLDR
A cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW), which shows that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods.
Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
TLDR
The aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes, and it is found that the models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies.
The field of protein function prediction as viewed by different domain scientists
TLDR
It is shown that the three core communities of experimental biologists, biocurators, and computational biologists have common but also idiosyncratic perspectives on the field, and a more productive and meaningful interaction between members of the core communities is necessary.
Gene function prediction in five model eukaryotes based on gene relative location through machine learning
TLDR
To the best of the knowledge this is the first work in which gene function prediction is successfully achieved in eukaryotic genomes using predictive features derived exclusively from the relative location of the genes.
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
TLDR
The CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and the ability to manage big data in the era of large experimental screens.
GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank
TLDR
GOLabeler is proposed, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a new paradigm of machine learning, especially powerful for multi-label classification.
Machine Learning Configurations for Enhanced Human Protein Function Prediction Accuracy
TLDR
This research is focused on sequence derived attributes/features (SDF) approach for HPF prediction and critically analyzed with the WEKA data analysis tool.
GOLabeler: improving sequence‐based large‐scale protein function prediction by learning to rank
TLDR
GOLabeler is proposed, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification.
...
...

References

SHOWING 1-10 OF 407 REFERENCES
Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data
TLDR
A probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data is proposed and evaluated.
Global protein function prediction from protein-protein interaction networks
TLDR
The assignment of proteins to functional classes on the basis of their network of physical interactions as determined by minimizing the number of protein interactions among different functional categories is proposed.
Protein function prediction by massive integration of evolutionary analyses and multiple data sources
TLDR
A novel scoring function called COmbined Graph-Information Content similarity (COGIC) score is proposed for the comparison of predicted functional categories and benchmark data and it is found that molecular function predictions are more accurate than biological process assignments.
Analysis of protein function and its prediction from amino acid sequence
TLDR
It is found that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30–100%).
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps
TLDR
A network-flow based algorithm, FunctionalFlow, is developed that exploits the underlying structure of protein interaction maps in order to predict protein function and has improved performance over previous methods in predicting the function of proteins with few (or no) annotated protein neighbors.
A large-scale evaluation of computational protein function prediction
TLDR
Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.
Prediction of human protein function according to Gene Ontology categories
TLDR
A method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme based on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties calculated from the amino acid composition is developed.
Prediction of Protein Function Using Protein-Protein Interaction Data
TLDR
This paper develops a novel approach that employs the theory of Markov random fields to infer a protein's functions using protein-protein interaction data and the functional annotations of protein's interaction partners and shows that this approach outperforms other available methods for function prediction based on protein interaction data.
Network-based prediction of protein function
TLDR
The current computational approaches for theFunctional annotation of proteins are described, including direct methods, which propagate functional information through the network, and module‐assisted methods, who infer functional modules within the network and use those for the annotation task.
Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters
TLDR
This work identifies potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules and presents experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing.
...
...