mGPfusion: predicting protein stability changes with Gaussian process kernel learning and data fusion

@article{Jokinen2018mGPfusionPP,
  title={mGPfusion: predicting protein stability changes with Gaussian process kernel learning and data fusion},
  author={Emmi Jokinen and Markus Heinonen and Harri L{\"a}hdesm{\"a}ki},
  journal={Bioinformatics},
  year={2018},
  volume={34},
  pages={i274 - i283}
}
Motivation Proteins are commonly used by biochemical industry for numerous processes. Refining these proteins' properties via mutations causes stability effects as well. Accurate computational method to predict how mutations affect protein stability is necessary to facilitate efficient protein design. However, accuracy of predictive models is ultimately constrained by the limited availability of experimental data. Results We have developed mGPfusion, a novel Gaussian process (GP) method for… 

Figures and Tables from this paper

DeepDDG: Predicting the Stability Change of Protein Point Mutations Using Neural Networks
TLDR
DeepDDG, a neural network-based method, is developed for use in the prediction of changes in the stability of proteins due to point mutations, which suggests that the buried hydrophobic area is the major determinant of protein stability.
The NK Landscape as a Versatile Benchmark for Machine Learning Driven Protein Engineering
TLDR
A unifying framework for ML-driven sequence-fitness prediction, using simulated (the NK model) and empirical sequence landscapes, to define four key performance metrics: interpolation within the training domain, extrapolation outside theTraining domain, robustness to sparse training data, and ability to cope with epistasis/ruggedness.
Towards guided mutagenesis: Gaussian process regression predicts MHC class II antigen mutant binding
TLDR
This work finds that prediction is most accurate for neutral residues at anchor residue sites without register shift, which holds relevance to predicting pMHCII binding and accelerating ASI design.
Machine-learning-guided directed evolution for protein engineering
TLDR
The steps required to build machine-learning sequence–function models and to use those models to guide engineering are introduced and the underlying principles of this engineering paradigm are illustrated with the help of case studies.
Machine learning in protein engineering
TLDR
This review introduces the steps required to collect protein data, train machine-learning models, and use trained models to guide engineering and makes recommendations at each stage to enable the discovery of new protein functions and uncover the relationship between protein sequence and function.
A method for efficient calculation of thermal stability of proteins upon point mutations.
TLDR
Comparison with the free energy perturbation (FEP) method and the recently developed machine learning methods on two different benchmark data sets shows that the current method is computationally efficient and also numerically reliable for predicting the changes in thermostability upon an arbitrary point mutation of a protein.
Biosystems Design by Machine Learning.
TLDR
This review describes commonly used models and modeling paradigms within ML and discusses successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocess.
Directed evolution of enzymes.
TLDR
The basic concepts of DEE, the most used methodologies and current technical advancements are discussed, providing examples of applications and perspectives, and experimental evidences to support mechanistic hypotheses of molecular evolution are discussed.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0
TLDR
The predictive power of this method, based on a formalism that highlights the coupling between four protein sequence and structure descriptors, and take into account the amino acid volume variation upon mutation, is shown to be significantly higher than that of other programs described in the literature.
NeEMO: a method using residue interaction networks to improve prediction of protein stability upon mutation
TLDR
NeEMO offers an innovative and reliable tool for the annotation of amino acid changes, and a key contribution are RINs, which can be used for modeling proteins and their interactions effectively.
Predicting protein stability changes from sequences using support vector machines
TLDR
A method based on support vector machines that predicts the sign and the value of free energy stability change upon single point mutation and corroborate the view that disease-related mutations correspond to a decrease in protein stability is found.
Prediction of protein stability changes for single‐site mutations using support vector machines
TLDR
The method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information.
Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site
TLDR
This work introduces a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution, named Pro-Maya (Protein Mutant stAbilitY Analyzer), which combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features.
DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach
TLDR
DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimized predictor using Support Vector Machines (SVM).
mCSM: predicting the effects of mutations in proteins using graph-based signatures
TLDR
It is shown that mCSM can predict stability changes of a wide range of mutations occurring in the tumour suppressor protein p53, demonstrating the applicability of the proposed method in a challenging disease scenario.
Feature-based multiple models improve classification of mutation-induced stability changes
TLDR
The results of EASE-MM support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure.
Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details.
TLDR
Six different methods that were previously reported as being able to predict the change in protein stability (DeltaDeltaG) upon mutation are chosen: CC/PBSA, EGAD, FoldX, I-Mutant2.0, Rosetta and Hunter, and there is still room for improvement, which is crucial if the authors want forcefields to perform better in their various tasks.
A three-state prediction of single point mutations on protein stability changes
TLDR
A support vector machine starting from the protein sequence or structure discriminates between stabilizing, destabilizing and neutral mutations that improves the quality of the prediction of the free energy change due to single point protein mutations by adopting a hypothesis of thermodynamic reversibility of the existing experimental data.
...
...