Learn More
All molecules of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chemical stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds. By comparison, only 63 857(More)
Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better(More)
The chemical universe database GDB-13, which enumerates 977 million organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the analysis of PubChem fragments. Two(More)
The database PubChem was classified using 42 integer value descriptors of molecular structure, here called molecular quantum numbers (MQNs), which count atoms and bond types, polar groups, and topological features. Principal component analysis of the MQN data set shows that PubChem compounds occupy a partially filled elliptical cone in the (PC1,PC2,PC3)(More)
The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were(More)
Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chemical space of small molecules. Enumeration from first principles shows that almost all small molecules (>99.9%) have never been synthesized and are still available to be prepared and tested. We discuss open access sources of(More)
Docking of randomly selected compounds from the chemical universe database GDB-11, which contains all organic molecules up to 11 atoms of C, N, O, F possible under consideration of simple chemical stability and synthetic feasibility rules, into the NMDA receptor glycine site (1pb7.pdb) lead to the identification of 3-(aminomethyl)piperazine-2,5-dione 3 and(More)
The flipping of membrane-embedded lipids containing large, polar head groups is slow and energetically unfavourable, and is therefore catalysed by flippases, the mechanisms of which are unknown. A prominent example of a flipping reaction is the translocation of lipid-linked oligosaccharides that serve as donors in N-linked protein glycosylation. In(More)
SMIfp (SMILES fingerprint) is defined here as a scalar fingerprint describing organic molecules by counting the occurrences of 34 different symbols in their SMILES strings, which creates a 34-dimensional chemical space. Ligand-based virtual screening using the city-block distance CBD(SMIfp) as similarity measure provides good AUC values and enrichment(More)