An evolution-based model for designing chorismate mutase enzymes

@article{Russ2020AnEM,
  title={An evolution-based model for designing chorismate mutase enzymes},
  author={William P. Russ and Matteo Figliuzzi and Christian Stocker and Pierre Barrat-Charlaix and Michael Socolich and Peter Kast and Donald Hilvert and R{\'e}mi Monasson and Simona Cocco and Martin Weigt and Rama Ranganathan},
  journal={Science},
  year={2020},
  volume={369},
  pages={440 - 445}
}
Learning from evolution Protein sequences contain information specifying their three-dimensional structure and function, and statistical analysis of families of sequences has been used to predict these properties. Building from sequence data, Russ et al. used statistical models that take into account conservation at amino acid positions and correlations in the evolution of pairs of amino acids to predict new artificial sequences that will have the properties of the protein family. For the… 
Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design.
TLDR
The utility of sequence data in protein engineering and design is discussed, focusing on recent advances in three main areas: the use of ancestral sequence reconstruction as an engineering tool to generate thermostable and multifunctional proteins, the useOf sequence data to guide engineering of multipoint mutants by structure-based computational protein design, and the useof unlabeled sequence data for unsupervised and semisupervised machine learning.
Aligning biological sequences by exploiting residue conservation and coevolution.
TLDR
DCAlign is presented, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information.
Aligning biological sequences by exploiting residue conservation and coevolution
TLDR
DCAlign is presented, an efficient approach based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include general second-order interactions among positions and to be therefore universally applicable to protein- and RNA-sequence alignment.
Large-scale design and refinement of stable proteins using sequence-only models
TLDR
A neural network model is built that predicts protein stability given only sequences of amino acids, and it is shown that the predictive model can be used to substantially increase the stability of both expert-designed and model-generated proteins.
Improving sequence-based modeling of protein families using secondary structure quality assessment.
TLDR
This work introduces two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching, and shows improvement in the detection of non-functional sequences.
Improving sequence-based modeling of protein families using secondary structure quality assessment
TLDR
Two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure are introduced, called Dot Product and Pattern Matching, which help rejecting non-functional sequences generated by graphical models learned from homologous sequence alignments.
Navigating the amino acid sequence space between functional proteins using a deep learning framework
TLDR
The ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces is confirmed.
Large-scale design and refinement of stable proteins using sequence-only models
TLDR
A neural network model is reported that predicts protein stability based only on sequences of amino acids, and its performance is demonstrated by evaluating the stability of almost 200,000 novel proteins, providing a baseline for future work in the field.
Enhancing computational enzyme design by a maximum entropy strategy
TLDR
It is shown that the statistical energy inferred from homologous sequences with the maximum entropy (MaxEnt) principle significantly correlates with enzyme catalysis and stability at the active site region and the more distant region, respectively.
adabmDCA: adaptive Boltzmann machine learning for biological sequences
TLDR
The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences.
...
...

References

SHOWING 1-10 OF 56 REFERENCES
Selection of sequence motifs and generative Hopfield-Potts models for protein families.
TLDR
It is shown that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models, and an approach to parameter reduction is proposed, which is based on selecting collective sequence motifs.
Natural-like function in artificial WW domains
TLDR
Construction of artificial protein sequences directed only by the SCA showed that the information extracted by this analysis is sufficient to engineer the WW fold at atomic resolution, and it was demonstrated that these artificial WW sequences function like their natural counterparts, showing class-specific recognition of proline-containing target peptides.
Evolutionary information for specifying a protein fold
TLDR
This work attempts to define the sequence rules for specifying a protein fold by computationally creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information.
Learning protein constitutive motifs from sequence data
TLDR
It is shown that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information and be used to unveil and exploit the genotype–phenotype relationship for protein families.
Functional Mapping of Protein-Protein Interactions in an Enzyme Complex by Directed Evolution
TLDR
Several MtCM variants, purified using a novel plasmid-based T7 RNA polymerase gene expression system, showed that a diminished ability to physically interact with MtDS correlates with reduced activatability and feedback regulatory control by Tyr and Phe.
Protein structure prediction from sequence variation
TLDR
Computation of covariation patterns are expected to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.
Quantification of the effect of mutations using a global probability model of natural sequence variation
TLDR
This work presents a statistical approach for quantifying the contribution of residues and their interactions to protein function, using a statistical energy, the evolutionary Hamiltonian, and finds that these probability models predict the experimental effects of mutations with reasonable accuracy for a number of proteins.
Coevolution-based inference of amino acid interactions underlying protein function
TLDR
Deep-mutation technologies are extended to enable measurement of many thousands of pairwise amino acid couplings in several homologs of a protein family – a deep coupling scan (DCS).
Protein structure determination using metagenome sequence data
TLDR
It is shown that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling.
...
...