Warmr: a data mining tool for chemical data

@article{King2001WarmrAD,
  title={Warmr: a data mining tool for chemical data},
  author={Ross D. King and Ashwin Srinivasan and Luc Dehaspe},
  journal={Journal of Computer-Aided Molecular Design},
  year={2001},
  volume={15},
  pages={173-181}
}
Data mining techniques are becoming increasingly important in chemistry as databases become too large to examine manually. [...] Key Method Data mining was used to find all frequent substructures in the database, and knowledge of these frequent substructures is shown to add value to the database. One use of the frequent substructures was to convert them into probabilistic prediction rules relating compound description to carcinogenesis. These rules were found to be accurate on test data, and to give some…Expand
Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis
TLDR
Four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), are described and an exhaustive list of their chemoinformatics applications is attempted. Expand
Frequent substructure-based approaches for classifying chemical compounds
TLDR
A substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. Expand
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
TLDR
A substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. Expand
A Graph Mining Algorithm for Classifying Chemical Compounds
TLDR
A novel graph mining algorithm, MIGDAC (Mining Graph DAta for Classification), that applies graph theory and an interestingness measure to discover interesting sub-graphs which can be both characterized and easily distinguished from other classes is proposed. Expand
Discovering Interesting Molecular Substructures for Molecular Classification
TLDR
A novel technique called mining interesting substructures in molecular data for classification (MISMOC) is proposed that can discover interesting frequent subgraphs not just for the characterization of a molecular class but also for the distinguishing of it from the others. Expand
SUBGRAPH RELATIVE FREQUENCY APPROACH FOR EXTRACTING INTERESTING SUBSTRUCTURES FROM MOLECULAR DATA
The classification of unseen molecule in molecular data is done by taking the substructures of the molecule. The mining of interesting substructures in molecular data for classification containExpand
Graph based molecular data mining - an overview
  • I. Fischer, T. Meinl
  • Computer Science
  • 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583)
  • 2004
TLDR
An overview on the different methods for graph data mining is given, starting with the greedy searches proposed in the middle of the nineties and taken into account as well as ideas influenced by basket analyses proposed lately. Expand
Data Mining Algorithms for Virtual Screening of Bioactive Compounds
TLDR
A sub-structure-based classification algorithm that decouples the sub-Structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. Expand
Subdue: compression-based frequent pattern discovery in graph data
TLDR
The graph-based data mining system Subdue is described which focuses on the discovery of sub-graphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm. Expand
A Randomized Exhaustive Propositionalization Approach for Molecule Classification
TLDR
This work extends the propositionalization approach recently proposed for multirelational data mining in two ways: it generates expressive attributes exhaustively, and it uses randomization to sample a limited set of complex attributes. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 51 REFERENCES
Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning
TLDR
SCAM is a computer program implemented to make extremely efficient use of Recursive partitioning, a method for statistically determining rules that classify objects into similar categories or, in this case, structures into groups of molecules with similar potencies. Expand
Recursive Partitioning Analysis of a Large Structure-Activity Data Set Using Three-Dimensional Descriptors1
TLDR
The idea is to encode the three-dimensional features of chemical compounds into bit strings and use RP to determine the important features that statistically correlate to the biological activities of these compounds. Expand
The discovery of indicator variables for QSAR using inductive logic programming
TLDR
It is concluded that ILP can aid in the process of drug design and be given a QSAR method that has the strength of ILP at describing steric structure, with the familiarity and power of linear regression. Expand
Quantitative Drug Design: A Critical Introduction
  • B. Levinson
  • Medicine
  • The Yale Journal of Biology and Medicine
  • 1979
TLDR
This eighth book is a positively delightful addition to the Medicinal Research Series, and Martin's emphasis is on the newer methods which have helped drug design develop over the past 20 years from inspired guesswork to a rational, quantitative (if still semi-empirical) discipline, involving extensive computer calculations. Expand
Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins.
TLDR
The main features of the CoMFA approach, exemplified by analyses of the affinities of 21 varied steroids to corticosteroid and testosterone-binding globulins, and a number of advances in the methodology of molecular graphics are described. Expand
Machine learning
TLDR
Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Expand
Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules
Application a l'etude du caractere cancerigene des hydrocarbures aromatiques polycycliques, du caractere cancerigene des N-nitrosamines chez les rats et de l'activite pesticide de carbamates deExpand
Inductive logic programming - techniques and applications
TLDR
Applications of inductive logic programming: learning rules for early diagnosis of rheumatic diseases finite element mesh design an overview of selected ILP applications. Expand
Lecture Notes in Artificial Intelligence
TLDR
The topics in LNAI include automated reasoning, automated programming, algorithms, knowledge representation, agent-based systems, intelligent systems, expert systems, machine learning, natural-language processing, machine vision, robotics, search systems, knowledge discovery, data mining, and related programming languages. Expand
Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients
IN the ‘two-point attachment’ theory1 on the mechanism of action for growth regulators of the auxin type we have assumed as a working hypothesis that the reaction between auxin and substrate is moreExpand
...
1
2
3
4
5
...