• Corpus ID: 103218

A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data

@inproceedings{Pandit2016APC,
  title={A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data},
  author={Rohan Pandit and Amarda Shehu},
  year={2016}
}
In this paper we investigate the utility of dimensionality reduction as a tool to analyze and simplify the structure space probed by de novo protein structure prediction methods. We conduct a principled comparative analysis in order to identify which techniques are effective and can be further used in decoy selection. The analysis allows drawing several interesting observations. For instance, many of the reportedly state-ofthe-art non-linear dimensionality reduction techniques fare poorly and… 

Figures and Tables from this paper

Reconstructing and Decomposing Protein Energy Landscapes to Organize Structure Spaces and Reveal Biologically-active States

This paper proposes a novel approach to reconstruct the underlying energy landscape populated by computed/sampled energy-evaluated structures of a molecule and decompose it into basins of attraction and makes important steps toward addressing the open decoy selection problem in template-free protein structure prediction.

An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction

This paper advocates that, while comparison of energies is not informative for structures that already populate minima of an energy function, the landscape view exposes the overall organization of generated decoys, and presents two different computational approaches to extracting such organization.

Reconstruction and Decomposition of High-Dimensional Landscapes via Unsupervised Learning

A novel, hybrid method is presented that combines strengths of these methods, allowing both visualization of the landscape and discovery of macrostates, and is of broad interest in cross-cutting problems that necessitate characterization of fitness and optimization landscapes.

Structure-Guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm

A novel, sampling-based algorithm to compute transition paths that adapts the probabilistic roadmap framework that is popular in robot motion planning and allows investigating hypotheses regarding the order of experimentally-known structures in a transition event.

A Comparison of Some Dimension Reduction Techniques with Varied Parameters

This paper presents and explains several methods of dimensionality reduction of data sets, beginning with the well known PCA and moving onto techniques that deal with data on a nonlinear manifold.

Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets

RV-Clustering, a library of unsupervised learning algorithms, and a new methodology designed to find optimum partitions within highly non-linear datasets that allow deconvoluting variables and notoriously improving performance metrics in supervised learning classification or regression models are presented.

References

SHOWING 1-10 OF 21 REFERENCES

A Data-Driven Evolutionary Algorithm for Mapping Multibasin Protein Energy Landscapes

Applications on wildtype and variant sequences of proteins involved in proteinopathies demonstrate that the algorithm makes an important first step toward understanding the impact of sequence mutations on misfunction by providing the energy landscape as the intermediate explanatory link between protein sequence and function.

Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction

The proposed method to obtain a few collective coordinates by using nonlinear dimensionality reduction can efficiently find a low-dimensional representation of a complex process such as protein folding.

Mapping the Conformation Space of Wildtype and Mutant H-Ras with a Memetic, Cellular, and Multiscale Evolutionary Algorithm

This paper proposes a novel algorithm, SIfTER, which is based instead on stochastic optimization to circumvent the computational challenge of exploring the breadth of a protein’s structure space, and applies it to variant sequences of the H-Ras catalytic domain.

Principal component analysis for protein folding dynamics.

Computing transition paths in multiple-basin proteins with a probabilistic roadmap algorithm guided by structure data

A novel, sampling-based algorithm to compute transition paths that adapts the probabilistic roadmap framework that is popular in robot motion planning and allows investigating hypotheses regarding the order of experimentally-known structures in a transition event.

Determination of reaction coordinates via locally scaled diffusion map.

The technique is general enough to be applied to any system for which a Boltzmann-sampled set of molecular configurations is available, and the resulting global coordinates are correlated with the time scales of the molecular motion.

Announcing the worldwide Protein Data Bank

The creation of the wwPDB formalizes the international character of the PDB and ensures that the archive remains single and uniform, and provides a mechanism to ensure consistent data for software developers and users worldwide.

Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis

This work applies the force paradigm to create localized versions of MDS stress functions with a tuning parameter to adjust the strength of nonlocal repulsive forces and solves the problem of tuning parameter selection with a meta-criterion that measures how well the sets of K-nearest neighbors agree between the data and the embedding.

Rank-based quality assessment of nonlinear dimensionality reduction

This paper reviews some of the existing quality measures that are based on distance ranking and K-ary neighborhoods, and draws an analogy between the co-ranking matrix and a Shepard diagram.