• Corpus ID: 17274507

Statistical Inference for Big Data Problems in Molecular Biophysics

  title={Statistical Inference for Big Data Problems in Molecular Biophysics},
  author={Arvind Ramanathan and Andrej J. Savol and Virginia M. Burger and Shannon P. Quinn and Pratul K. Agarwal and Chakra Chennubhotla},
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technological and algorithmic improvements in computation have brought molecular simulations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important… 

Figures from this paper

Distributed Spectral Graph Methods for Analyzing Large-Scale Unstructured Biomedical Data

A quantitative model of ciliary motion phenotypes is developed, using spectral graph methods for unsupervised latent pattern discovery and a distributed hierarchical eigensolver is compared directly to other popular solvers for its essential role in enabling the discovery of novel ciliaryMotion phenotypes and in identifying physiochemical-perceptual associations.

AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics

A generalizable AI-driven workflow is developed that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems and demonstrates how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.

Benchmarking Machine Learning Workloads in Structural Bioinformatics Applications

This paper presents an overview of different learning approaches in structural bioinformatics applications, performance considerations for such coupled applications, and outline the development of performance metrics, and hopes that this could serve as a framework for other application domains.

AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics

A generalizable AI-driven workflow is developed that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems and presents several novel scientific discoveries, including the elucidation of the spike’s full glycan shield and the characterization of the flexible interactions between the spike and the human ACE2 receptor.

Challenges and frontiers of computational modelling of biomolecular recognition

The challenges and computational approaches developed to characterise biomolecular binding, including molecular docking, molecular dynamics simulations (especially enhanced sampling) and machine learning are reviewed.

Deep clustering of protein folding simulations

The CVAE model can quantitatively describe complex biophysical processes such as protein folding, and can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features.



Event detection and sub‐state discovery from biomolecular simulations using higher‐order statistics: Application to enzyme adenylate kinase

HOST4MD is presented—a higher‐order statistical toolbox for molecular dynamics simulations, which identifies key dynamical events as simulations are in progress, explores potential sub‐ states, and identifies conformational transitions that enable the protein to access those sub‐states.

On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations.

It is demonstrated that the patterns discovered by DTA often correspond to functionally important conformational substates and is well-suited to analyzing long timescale simulations, which are critical for studying biologically relevant motions but may be too large for traditional analysis methods.

Dynameomics: a comprehensive database of protein dynamics.

Full correlation analysis of conformational protein dynamics

FCA should provide improved collective degrees of freedom for dimension‐reduced descriptions of macromolecular dynamics and is shown to be due to a strongly increased anharmonicity of FCA modes as compared to the respective PCA modes.

Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms.

There is no one perfect "one size fits all" algorithm for clustering MD trajectories and that the results strongly depend on the choice of atoms for the pairwise comparison, so the best performance was observed with the average-linkage, means, and SOM algorithms.

Discovering Conformational Sub-States Relevant to Protein Function

Quasi-anharmonic analysis (QAA) provides a novel framework to intuitively understand the biophysical basis of conformational diversity and its relevance to protein function.

Transiently populated intermediate functions as a branching point of the FF domain folding pathway

This study establishes the FF domain intermediate as a central player in both folding and misfolding pathways and illustrates how incomplete folding can lead to the formation of higher-order structures.

Hidden alternate structures of proline isomerase essential for catalysis

Dual strategies of ambient-temperature X-ray crystallographic data collection and automated electron-density sampling are introduced to structurally unravel interconverting substates of the human proline isomerase, cyclophilin A (CYPA).

Accessing a Hidden Conformation of the Maltose Binding Protein Using Accelerated Molecular Dynamics

Periplasmic binding proteins (PBPs) are a large family of molecular transporters that play a key role in nutrient uptake and chemotaxis in Gram-negative bacteria. All PBPs have characteristic

A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories

  • Tiankai TuC. Rendleman D. Shaw
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
A new parallel analysis framework called HiMach, which allows users to write trajectory analysis programs sequentially, and carries out the parallel execution of the programs automatically, and an extension to the original MapReduce model to support multiple rounds of analysis.