Unique identifiers for small molecules enable rigorous labeling of their atoms

  title={Unique identifiers for small molecules enable rigorous labeling of their atoms},
  author={Hesam T. Dashti and William M. Westler and John L. Markley and Hamid R. Eghbalnia},
  journal={Scientific Data},
Rigorous characterization of small organic molecules in terms of their structural and biological properties is vital to biomedical research. The three-dimensional structure of a molecule, its ‘photo ID’, is inefficient for searching and matching tasks. Instead, identifiers play a key role in accessing compound data. Unique and reproducible molecule and atom identifiers are required to ensure the correct cross-referencing of properties associated with compounds archived in databases. The best… 

Approach to Improving the Quality of Open Data in the Universe of Small Molecules

An approach to improving the quality and interoperability of open data related to small molecules, such as metabolites, drugs, natural products, food additives, and environmental contaminants, by computer implementation of an extended version of the IUPAC International Chemical Identifier system.

Automated evaluation of consistency within the PubChem Compound database

The ALATIS approach, which is based on the international chemical shift identifier (InChI) model, is applied to the full PubChem Compound database to generate unique and reproducible compound and atom identifiers for all entries for which three-dimensional structures were available.

Atom Identifiers Generated by a Graph Coloring Method Enable Compound Harmonization Across Metabolic Databases

A graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network, and is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry.

Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases

A neighborhood-specific graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network, and is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry.

Robust nomenclature and software for enhanced reproducibility in molecular modeling of small molecules

An automated and verifiable computational pipeline for calculating the force field parameters of small molecules that integrates several software tools and guarantees reproducibility of the parameters by utilizing a standard nomenclature across multiple computational steps and by maintaining file verification identifiers.

NMReDATA, a standard to report the NMR assignment and parameters of organic compounds

A new format is introduced to associate the NMR parameters extracted from 1D and 2D spectra of organic compounds to the proposed chemical structure, an extension of the existing Structure Data Format, which is compatible with the commonly used MOL format.

Probabilistic identification of saccharide moieties in biomolecules and their protein complexes

An algorithm and software package called CTPIC that analyzes the covalent structure of a compound to yield a probabilistic measure for distinguishing saccharides and saccharide-derivatives from non-saccharides is developed.

Tools for Enhanced NMR-Based Metabolomics Analysis.

Developments leading to a rigorous basis for unique identification of compounds, reproducible numbering of atoms, the compact representation of NMR spectra of metabolites and small molecules, tools for improved compound identification, quantification and visualization, and approaches toward the goal of rigorous analysis of metabolomics data are discussed.

BioMagResBank (BMRB) as a Resource for Structural Biology.

The goal is to describe various BMRB services offered to structural biology researchers and how they can be accessed and utilized and the NMR-STAR data format used by B MRB and the tools provided to facilitate its use.

Applications of Parametrized NMR Spin Systems of Small Molecules.

It is described here how libraries of these spin systems utilizing unique and reproducible atom numbering can be used to improve NMR-based ligand screening and metabolomics studies.



Consistency of systematic chemical identifiers within and between small-molecule databases

It is shown that considerable inconsistency exists in structural representation and systematic chemical identifiers within and between databases, especially when merging data and if systematic identifiers are used as a key index for structure integration or cross-querying several databases.

Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm

A reference Python implementation of the novel canonicalization approach that uses a standard stable-sorting algorithm instead of a Morgan-like index is provided and provides a first step toward a common standard for canonical atom ordering to generate a universal unique identifier for molecules other than InChI.

PubChem Substance and Compound databases

An overview of the PubChem Substance and Compound databases is provided, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access.

The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013

This work has completely aligned the ontology with the Open Biomedical Ontologies (OBO) Foundry-recommended upper level Basic Formal Ontology, and as a result of this effort, the majority of chemical-involving processes in GO are now defined in terms of the ChEBI entities that participate in them.

CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures

CurlySMILES is a chemical line notation which extends SMILES with annotations for storage, retrieval and modeling of interlinked, coordinated, assembled and adsorbed molecules in supramolecular

Fragment screening: an introduction.

This chapter has provided an introduction to the theoretical and practical issues associated with the use of fragment methods and lead-likeness, which offers some significant advantages by providing less complex molecules, which may have better potential for novel drug optimisation and by enabling new chemical space to be more effectively explored.

NMRmix: A Tool for the Optimization of Compound Mixtures in 1D 1H NMR Ligand Affinity Screens

A software tool to assist in creating ideal mixtures from a large panel of compounds with known chemical shifts, NMRmix, utilizes a simulated annealing algorithm to optimize the composition of the mixtures to minimize spectral peak overlaps so that each compound in the mixture is represented by a maximum number of nonoverlapping chemical shifts.

Open Babel: An open chemical toolbox

The implementation of Open Babel is detailed, key advances in the 2.3 release are described, and a variety of uses are outlined both in terms of software products and scientific research, including applications far beyond simple format interconversion.

MassBank: a public repository for sharing mass spectral data for life sciences.

MassBank is the first public repository of mass spectra of small chemical compounds for life sciences and provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions.

Structure- and Ligand-Based Virtual Screening Identifies New Scaffolds for Inhibitors of the Oncoprotein MDM2

A new virtual screening procedure is used which uses a combination of similarity searching and docking to identify chemically tractable scaffolds that bind to the p53-interaction site of MDM2.