Corpus ID: 222178155

Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins

@article{MedinaOrtiz2020CombinationOD,
  title={Combination of digital signal processing and assembled predictive models facilitates the rational design of proteins},
  author={David Medina-Ortiz and Sebasti{\'a}n Contreras and Juan Amado-Hinojosa and Jorge Torres-Almonacid and Juan A. Asenjo and Marcelo A. Navarrete and {\'A}lvaro Olivera-Nappa},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.03516}
}
Predicting the effect of mutations in proteins is one of the most critical challenges in protein engineering; by knowing the effect a substitution of one (or several) residues in the protein's sequence has on its overall properties, could design a variant with a desirable function. New strategies and methodologies to create predictive models are continually being developed. However, those that claim to be general often do not reach adequate performance, and those that aim to a particular task… Expand

Figures and Tables from this paper

Peptipedia: a comprehensive database for peptide research supported by Assembled predictive models and Data Mining approaches
TLDR
Peptipedia is developed, a user-friendly database and web application to search, characterise and analyse peptide sequences that integrates the information from thirty previously reported databases, making it the largest repository of peptides with recorded activities so far. Expand
Peptipedia: a user-friendly web application and a comprehensive database for peptide research supported by Machine Learning approach
TLDR
Peptipedia, a user-friendly web application and comprehensive database to search, characterize and analyse peptide sequences, is developed, making it the biggest repository of peptides with recorded activities to date. Expand

References

SHOWING 1-10 OF 59 REFERENCES
Variational auto-encoding of protein sequences
TLDR
An embedding of natural protein sequences using a Variational Auto-Encoder is presented and used to predict how mutations affect protein function and to computationally guide exploration of protein sequence space and to better inform rational and automatic protein design. Expand
A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes
TLDR
This work presents an innovative sequence-activity relationship (innov’SAR) methodology based on digital signal processing combining wet-lab experimentation and computational protein design and illustrates the application in the case of improving the enantioselectivity of an epoxide hydrolase from Aspergillus niger. Expand
Application of fourier transform and proteochemometrics principles to protein engineering
TLDR
iSAR (innovative Sequence Activity Relationship) is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size. Expand
Interpretable Numerical Descriptors of Amino Acid Space
  • A. Georgiev
  • Mathematics, Computer Science
  • J. Comput. Biol.
  • 2009
TLDR
Conclusions from this study highlight the discord between ease of interpretation of amino acid scales and their relevance to protein structure conservation, as well as general considerations for designing custom scale sets. Expand
Rational Designing of Novel Proteins Through Computational Approaches
TLDR
This chapter will discuss several of the rational designing computational tools that are capable of obtaining structures of unknown polypeptide chains and characterizing the functional hotspots, thus aid the researchers in designing novel functional motifs with minimal bench work. Expand
A multivariate clustering of AAindex database for protein numerical representation
TLDR
This paper aims at the construction of new indices through clustering of AA index database with correlation distance and suggests that due to the correlation of these new maps with groups of AAindex indices (in clusters); they have the potential to be used for numerical representation of protein sequence in different studies. Expand
Machine learning-assisted directed protein evolution with combinatorial libraries
TLDR
It is proposed that the expense of experimentally testing a large number of protein variants can be decreased and the outcome can be improved by incorporating machine learning with directed evolution, and that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. Expand
Mutagenesis Objective Search and Selection Tool (MOSST): an algorithm to predict structure-function related mutations in proteins
TLDR
A new and powerful methodology is proposed to guide two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships. Expand
Effective DNA binding protein prediction by using key features via Chou's general PseAAC.
TLDR
This paper has extracted several features solely using the protein sequence and carried out two different types of feature selection on them and their results have proven comparable on training set and significantly improved on the independent test set. Expand
Learned protein embeddings for machine learning
TLDR
The predictive power of Gaussian process models trained usingembeddings is comparable to those trained on existing representations, which suggests that embeddings enable accurate predictions despite having orders of magnitude fewer dimensions. Expand
...
1
2
3
4
5
...