Frequent substructure-based approaches for classifying chemical compounds

@article{Deshpande2005FrequentSA,
  title={Frequent substructure-based approaches for classifying chemical compounds},
  author={Mukund Deshpande and Michihiro Kuramochi and George Karypis},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2005},
  volume={17},
  pages={1036-1050}
}
Computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or nontoxic, and filtering out drug-like compounds from large compound libraries… 
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
TLDR
A substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set.
Frequent sub-structure-based approaches for classifying chemical compounds
TLDR
A substructure-based classification algorithm is presented that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the dataset.
Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification
TLDR
GraphSig successfully overcomes the scalability bottleneck of mining patterns at a low frequency and is found to be more informative than features generated exhaustively by traditional fingerprints; this has potential in providing scaffolds and lead generation.
Acyclic Subgraph Based Descriptor Spaces for Chemical Compound Retrieval and Classification
TLDR
This paper introduces and describes algorithms for efficiently generating a new set of descriptors that are derived from all connected acrylic fragments present in the molecular graphs, and introduces an extension to existing vector-based kernel functions to take into account the length of the fragmentsPresent in the descriptors.
Molecular Substructure Mining Approaches for Computer-Aided Drug Discovery : A Review
mining is a well-established technique used frequently in drug discovery. Its aim is to discover and characterize interesting 2D substructures present in chemical datasets. The popularity of the
Fast rule-based bioactivity prediction using associative classification mining
TLDR
This study utilizes a collection of methods, called associative classification mining (ACM), which are popular in the data mining community, but so far have not been applied widely in cheminformatics.
TR 07-010 Methods for Effective Virtual Screening and Scaffold-Hopping in Chemical Compounds
TLDR
Experimental evaluation shows that many of these methods substantially outperform previously developed approaches both in terms of their ability to identify structurally diverse active compounds as well as active compounds in general.
Support Vector Machine Classifier for Predicting Drug Binding to P-glycoprotein
TLDR
Development of a prediction method to identify substrates and nonsubstrates of Pglycoprotein, based on a support vector machine algorithm, using a combination of descriptors, encoding substructure types and their relative positions in the drug molecule, thus considering both the chemical nature as well as the three dimensional shape information.
Querying and mining chemical databases for drug discovery
TLDR
This thesis proposes core indexing and mining algorithms that extend the current state of the art in computer science research that are applicable in other scientific domains such as software bug mining, analysis of communication graphs, social networks, sensor networks, and transportation networks.
Breadth-First Search Approach to Enumeration of Tree-like Chemical Compounds
TLDR
Efficient algorithms, BfsSimEnum and BfsMulEnum are proposed to enumerate tree-like molecules without and with multiple bonds, respectively, where chemical compounds are represented as molecular graphs, to reduce the large search space.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 156 REFERENCES
Frequent sub-structure-based approaches for classifying chemical compounds
TLDR
A substructure-based classification algorithm is presented that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the dataset.
Finding Frequent Substructures in Chemical Compounds
TLDR
This paper applies data mining to the problem of predicting chemical carcinogenicity, and presents a knowledge discovery method for structured data, where patterns reflect the one- to-many and many-to-many relationships of several tables.
Comparisons of classification methods for screening potential compounds
  • Aijun An, Yuanyuan Wang
  • Computer Science
    Proceedings 2001 IEEE International Conference on Data Mining
  • 2001
TLDR
A number of data mining and statistical methods are compared on the drug design problem of modeling molecular structure-activity relationships, which can be used to identify active compounds based on their chemical structures from a large inventory of chemical compounds.
Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning
TLDR
SCAM is a computer program implemented to make extremely efficient use of Recursive partitioning, a method for statistically determining rules that classify objects into similar categories or, in this case, structures into groups of molecules with similar potencies.
Mining molecular fragments: finding relevant substructures of molecules
  • C. Borgelt, M. Berthold
  • Computer Science
    2002 IEEE International Conference on Data Mining, 2002. Proceedings.
  • 2002
TLDR
An algorithm to find fragments in a set of molecules that help to discriminate between different classes of for instance, activity in a drug discovery context is presented, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search.
Warmr: a data mining tool for chemical data
TLDR
Warmr is presented, the first ILP data mining algorithm to be applied to chemoinformatic data, and the substructures were used to prove that there existed no accurate rule, based purely on atom-bond substructure with less than seven conditions, that could predict carcinogenicity.
Automated Approaches for Classifying Structures
TLDR
An algorithm that first mines the chemical compound dataset to discover discriminating sub-structures are used as features to build a powerful classifier that requires very little domain knowledge and can easily handle large chemical datasets.
Recursive Partitioning Analysis of a Large Structure-Activity Data Set Using Three-Dimensional Descriptors1
TLDR
The idea is to encode the three-dimensional features of chemical compounds into bit strings and use RP to determine the important features that statistically correlate to the biological activities of these compounds.
Analysis of Large Screening Data Sets via Adaptively Grown Phylogenetic-Like Trees
TLDR
Experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets, as well as the main differences with the methods currently in use.
Molecular feature mining in HIV data
TLDR
The application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database is presented, with the aim of detecting molecular substructures that are frequent in the active molecules, and infrequent in the inactives.
...
1
2
3
4
5
...