Learn More
A new algorithm for identifying three-dimensional configurations of chemical features common to a set of molecules is described. The algorithm scores each configuration based both on the degree to which it is common to the input set and its estimated rarity. The algorithm can be applied to molecules with large (several hundred) conformational models.(More)
Structure-based screening using fully flexible docking is still too slow for large molecular libraries. High quality docking of a million molecule library can take days even on a cluster with hundreds of CPUs. This performance issue prohibits the use of fully flexible docking in the design of large combinatorial libraries. We have developed a fast(More)
High content screening is a method for identifying small molecule modulators of mammalian cell biology. The nature of the experiment generates an enormous amount of data in the form of photographic images of cells after treatment with compounds of interest. The interpretation of data from these experiments is challenging both in terms of automatically(More)
As high throughput techniques in chemical synthesis and screening improve, more demands are placed on computer assisted design and virtual screening. Many of these computational methods require one or more three-dimensional conformations for molecules, creating a demand for a conformational analysis tool that can rapidly and robustly cover the low-energy(More)
Molecules are often represented as bit string fingerprints in databases. These bit strings are used for similarity searching using the Tanimoto coefficient and rapid indexing. A new data structure is introduced, the compressed bit binary tree, that rapidly increases search and indexing times by up to a factor of 30. Results will be shown for databases of up(More)
The K-means method is a popular technique for clustering data into k-partitions. In the adaptive form of the algorithm, Lloyds method, an iterative procedure alternately assigns cluster membership based on a set of centroids and then redefines the centroids based on the computed cluster membership. The most time-consuming part of this algorithm is the(More)
The searching and characterization of large chemical databases has recently provoked much interest, particularly with respect to the question of whether any of the compounds in the database could serve as new leads to a compound of pharmacological interest. This paper introduces a fast and novel method of determining whether any of a given series of(More)