Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem


The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were organized according to ring count (PC1, 38% of variance), the molecular size (PC2, 25% of variance), and the H-bond acceptor count (PC3, 12% of variance). Compounds following Lipinski's bioavailability, Oprea's lead-likeness and Congreve's fragment-likeness criteria formed separated groups in MQN-space visible in the (PC2, PC3) plane. MQN-similarity searches of the 4.5 million molecules (see the browser available at ) gave significant enrichment factors for recovering groups of fragment-sized bioactive compounds related to ten different biological targets taken from Chembl, allowing lead-hopping relationships not seen with substructure fingerprint similarity searches. The diversity of different compound series was analyzed by MQN-distance histograms.

DOI: 10.1007/s10822-011-9437-x

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@article{Deursen2011VisualisationOT, title={Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem}, author={Ruud van Deursen and Lorenz C. Blum and Jean-Louis Reymond}, journal={Journal of computer-aided molecular design}, year={2011}, volume={25 7}, pages={649-62} }