BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces

@article{Ojewole2018BBKA,
  title={BBK* (Branch and Bound Over K*): A Provable and Efficient Ensemble-Based Protein Design Algorithm to Optimize Stability and Binding Affinity Over Large Sequence Spaces},
  author={Adegoke A. Ojewole and Jonathan D. Jou and Vance G. Fowler and Bruce Randall Donald},
  journal={Journal of computational biology : a journal of computational molecular cell biology},
  year={2018},
  volume={25 7},
  pages={
          726-739
        }
}
  • Adegoke A. Ojewole, J. D. Jou, B. Donald
  • Published 1 July 2018
  • Computer Science, Biology
  • Journal of computational biology : a journal of computational molecular cell biology
Computational protein design (CPD) algorithms that compute binding affinity, Ka, search for sequences with an energetically favorable free energy of binding. Recent work shows that three principles improve the biological accuracy of CPD: ensemble-based design, continuous flexibility of backbone and side-chain conformations, and provable guarantees of accuracy with respect to the input. However, previous methods that use all three design principles are single-sequence (SS) algorithms, which are… 

Figures and Tables from this paper

Minimization-Aware Recursive K*: A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape
TLDR
A novel algorithm, Minimization-Aware Recursive K* (MARK*), which tightens bounds not on single conformations, but instead on distinct regions of the conformation space, which both accelerates existing designs and offers new capabilities not possible with previous algorithms.
Variable Neighborhood Search with Cost Function Networks To Solve Large Computational Protein Design Problems
TLDR
Variable neighborhood search (VNS) with cost function networks is a powerful framework that can provide tight upper bounds on the global minimum energy and a new CPD heuristic based on VNS is proposed in which a subset of the solution space (a "neighborhood") is explored, whose size is gradually increased with a dedicated probabilistic heuristic.
AND/OR Branch-and-Bound for Computational Protein Design Optimizing K*
TLDR
This work introduces AOBB-KMAP, a new branch-and-bound algorithm over AND/OR search spaces for solving the KMAP problem, and formulating CPD as a graphical model for K optimization and providing an new efficient algorithm.
Minimization-Aware Recursive K^* K ∗ ( MARK^* MARK ∗ ): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape
TLDR
A novel algorithm is combined into a novel algorithm, Minimization-Aware Recursive Recursive \(K^{*}\) (\({ MARK}) that tightens bounds not on single conformations, but instead on distinct regions of the conformation space, and provably approximates the energy landscape.
A C++ library for protein sub-structure search
TLDR
An entirely reorganized approach to database representation now enables large structural databases to be stored in memory, further simplifying development of automated search-based methods.
Molecular flexibility in computational protein design: an algorithmic perspective.
TLDR
The principles of CPD are outlined and recent effort in algorithmic developments for incorporating molecular flexibility in the design process are discussed, to help relieve the inaccuracies resulting from these simplifications and to improve design reliability.
Modularity‐based parallel protein design algorithm with an implementation using shared memory programming
TLDR
The shared memory implementation of modularity‐based parallel sequence search leads to better search space exploration compared to the case of traditional full protein design and can be extended to protein interaction design as well.
Resistor: an algorithm for predicting resistance mutations using Pareto optimization over multistate protein design and mutational signatures
TLDR
By exploiting the wealth of structural and sequence data available in the form of molecular structures and mutational signatures, Resistor is a general method for predicting resistance mutations that can be applied to a wide variety of cancer, antimicrobial, antiviral and antifungal drug targets.
Computational Analysis of Energy Landscapes Reveals Dynamic Features that Contribute to Binding of Inhibitors to CFTR-Associated Ligand.
TLDR
A crystal structure of kCAL01 bound to CALP is reported and structural features against iCAL36, a previously developed inhibitor of CALP, are compared and suggest not only that ensemble-based design captured thermodynamically significant features observed in vitro, but also that a design eschewing ensembles would miss the kCal01 sequence entirely.
Computational Analysis of Energy Landscapes Reveals Dynamic Features that Contribute to Binding of Inhibitors to CFTR-Associated Ligand
TLDR
A crystal structure of kCAL01 bound to CALP is reported and suggests not only that ensemble-based design captured thermodynamically significant features observed in vitro, but also that a design algorithm eschewing ensembles would likely miss the kCal01 sequence entirely.
...
...

References

SHOWING 1-10 OF 69 REFERENCES
comets (Constrained Optimization of Multistate Energies by Tree Search): A Provable and Efficient Protein Design Algorithm to Optimize Binding Affinity and Specificity with Respect to Sequence
TLDR
Comets provides a new level of versatile, efficient, and provable multistate design that provably returns the minimum with respect to sequence of any desired linear combination of the energies of multiple protein states, subject to constraints on other linear combinations.
LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid Rotamer-Like Efficiency
TLDR
This work models continuous flexibility and non-residue-pairwise energies in a form suitable for direct input to highly efficient, discrete combinatorial optimization algorithms such as DEE/A* or branch-width minimization.
LUTE (Local Unpruned Tuple Expansion): Accurate Continuously Flexible Protein Design with General Energy Functions and Rigid Rotamer-Like Efficiency
TLDR
A novel algorithm performs a local unpruned tuple expansion (LUTE), which can efficiently represent both continuous flexibility and general, possibly nonpairwise energy functions to an arbitrary level of accuracy using a discrete energy matrix.
BWM*: A Novel, Provable, Ensemble-Based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design
TLDR
A novel, provable, dynamic programming algorithm called Branch-Width Minimization (BWM) to enumerate a gap-free ensemble of conformations in order of increasing energy, which outperforms the classical search algorithm A in 49 of 67 protein design problems.
Fast gap‐free enumeration of conformations and sequences for protein design
TLDR
Two classes of algorithmic improvements to the A* algorithm are presented that greatly increase the efficiency of A* and enable all A*‐based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
Guaranteed Discrete Energy Optimization on Large Protein Design Problems.
TLDR
An exact deterministic method combining branch and bound, arc consistency, and tree-decomposition is used to provenly identify the global minimum energy sequence-conformation on full-redesign problems, defining search spaces of size up to 10(234).
Fast search algorithms for computational protein design
TLDR
Improved and injected CFN technology into the well‐established CPD package Osprey to allow all Ospreys CPD algorithms to benefit from associated speedups and make it possible to solve larger CPD problems with provable algorithms.
Improved energy bound accuracy enhances the efficiency of continuous protein design
TLDR
Two methods are presented that greatly increase the speed and efficiency of protein design with continuous rotamers by tightening the energy bounds, additional pruning of the conformation space can be achieved, and the number of conformations that must be enumerated to find the global minimum energy conformation is greatly reduced.
A new framework for computational protein design through cost function network optimization
TLDR
It is shown that the CFN-based approach is able to solve optimality a variety of complex designs that could often not be solved using a usual CPD-dedicated tool or state-of-the-art exact operations research tools.
Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design
TLDR
Novel enhancements to both the DEE and MinDEE criteria are presented, which result in a speedup of up to a factor of more than 1000 when applied in redesign for three different proteins: Gramicidin Synthetase A, plastocyanin, and protein G.
...
...