Corpus ID: 235390482

Adaptive machine learning for protein engineering

  title={Adaptive machine learning for protein engineering},
  author={Brian L. Hie and Kevin K. Yang},
Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine… Expand

Figures and Tables from this paper


Machine-learning-guided directed evolution for protein engineering
The steps required to build machine-learning sequence–function models and to use those models to guide engineering are introduced and the underlying principles of this engineering paradigm are illustrated with the help of case studies. Expand
Data-driven computational protein design.
This work presents recent creative uses of multiple-sequence alignments, protein structures, and high-throughput functional assays in computational protein design. Expand
Advances in machine learning for directed evolution.
Advances in ML approaches that use ML models trained on sequences to generate new functional sequence diversity are highlighted, focusing on strategies that use these generative models to efficiently explore vast regions of protein space. Expand
Machine learning-assisted directed protein evolution with combinatorial libraries
It is proposed that the expense of experimentally testing a large number of protein variants can be decreased and the outcome can be improved by incorporating machine learning with directed evolution, and that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. Expand
Low-N protein engineering with data-efficient deep learning
A machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution is introduced. Expand
ProGen: Language Modeling for Protein Generation
This work poses protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations and trains a 1.2B-parameter language model, ProGen, on ∼280M protein sequences conditioned on taxonomic and keyword tags. Expand
Protein sequence design with deep generative models
In this review, recent applications of machine learning to generate protein sequences are highlighted, focusing on the emerging field of deep generative methods. Expand
Protein Design and Variant Prediction Using Autoregressive Generative Models
This work introduces a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments and successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library. Expand
An evolution-based model for designing chorismate mutase enzymes
A process to learn the constraints for specifying proteins purely from evolutionary sequence data, design and build libraries of synthetic genes, and test them for activity in vivo using a quantitative complementation assay is described. Expand
Model-based reinforcement learning for biological sequence design
A model-based variant of PPO, DyNA-PPO, is proposed to improve sample efficiency and performs significantly better than existing methods in settings in which modeling is feasible, while still not performing worse in situations in which a reliable model cannot be learned. Expand