• Corpus ID: 17274507

Statistical Inference for Big Data Problems in Molecular Biophysics

  title={Statistical Inference for Big Data Problems in Molecular Biophysics},
  author={Arvind Ramanathan and Andrej J. Savol and Virginia M. Burger and Shannon P. Quinn and Pratul K. Agarwal and Chakra Chennubhotla},
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technological and algorithmic improvements in computation have brought molecular simulations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important… 

Figures from this paper

Deep clustering of protein folding simulations

It is shown that the CVAE model can quantitatively describe complex biophysical processes such as protein folding, and can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features.

Distributed Spectral Graph Methods for Analyzing Large-Scale Unstructured Biomedical Data

A quantitative model of ciliary motion phenotypes is developed, using spectral graph methods for unsupervised latent pattern discovery and a distributed hierarchical eigensolver is compared directly to other popular solvers for its essential role in enabling the discovery of novel ciliaryMotion phenotypes and in identifying physiochemical-perceptual associations.

AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics

A generalizable AI-driven workflow is developed that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems and demonstrates how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.

Benchmarking Machine Learning Workloads in Structural Bioinformatics Applications

This paper presents an overview of different learning approaches in structural bioinformatics applications, performance considerations for such coupled applications, and outline the development of performance metrics, and hopes that this could serve as a framework for other application domains.

AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics

A generalizable AI-driven workflow is developed that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems and presents several novel scientific discoveries, including the elucidation of the spike’s full glycan shield and the characterization of the flexible interactions between the spike and the human ACE2 receptor.

Challenges and frontiers of computational modelling of biomolecular recognition

The challenges and computational approaches developed to characterise biomolecular binding, including molecular docking, molecular dynamics simulations (especially enhanced sampling) and machine learning are reviewed.



Event detection and sub‐state discovery from biomolecular simulations using higher‐order statistics: Application to enzyme adenylate kinase

HOST4MD is presented—a higher‐order statistical toolbox for molecular dynamics simulations, which identifies key dynamical events as simulations are in progress, explores potential sub‐ states, and identifies conformational transitions that enable the protein to access those sub‐states.

On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations.

It is demonstrated that the patterns discovered by DTA often correspond to functionally important conformational substates and is well-suited to analyzing long timescale simulations, which are critical for studying biologically relevant motions but may be too large for traditional analysis methods.

Dynameomics: a comprehensive database of protein dynamics.

Progress and challenges in the automated construction of Markov state models for full protein systems.

This work demonstrates the application of a toolkit for automating the construction ofMarkov state models to the villin headpiece (HP-35 NleNle), one of the smallest and fastest folding proteins, and shows that the resulting MSM captures both the thermodynamics and kinetics of the original molecular dynamics of the system.

Full correlation analysis of conformational protein dynamics

FCA should provide improved collective degrees of freedom for dimension‐reduced descriptions of macromolecular dynamics and is shown to be due to a strongly increased anharmonicity of FCA modes as compared to the respective PCA modes.

Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms.

There is no one perfect "one size fits all" algorithm for clustering MD trajectories and that the results strongly depend on the choice of atoms for the pairwise comparison, so the best performance was observed with the average-linkage, means, and SOM algorithms.

Automated Event Detection and Activity Monitoring in Long Molecular Dynamics Simulations.

This paper presents automated methods for the detection of potentially important structure-changing events in long MD trajectories and provides a detailed report of broken and formed contacts that aids in the identification of specific time-dependent side-chain interactions.

MDAnalysis: A toolkit for the analysis of molecular dynamics simulations

MDAnalysis is an object‐oriented library for structural and temporal analysis of molecular dynamics simulation trajectories and individual protein structures that uses the powerful NumPy package to expose trajectory data as fast and efficient NumPy arrays.

Discovering Conformational Sub-States Relevant to Protein Function

Quasi-anharmonic analysis (QAA) provides a novel framework to intuitively understand the biophysical basis of conformational diversity and its relevance to protein function.

Transiently populated intermediate functions as a branching point of the FF domain folding pathway

This study establishes the FF domain intermediate as a central player in both folding and misfolding pathways and illustrates how incomplete folding can lead to the formation of higher-order structures.