Learn More
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended(More)
We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on(More)
The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcino-genicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem(More)
Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described(More)
MOTIVATION Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome(More)
Cervical neoplasia-specific biomarkers, e.g. DNA methylation markers, with high sensitivity and specificity are urgently needed to improve current population-based screening on (pre)malignant cervical neoplasia. We aimed to identify new cervical neoplasia-specific DNA methylation markers and to design and validate a methylation marker panel for triage of(More)
The clausal discovery engine Claudien is presented. Claudien is an inductive logic programming engine that fits in the descriptive data mining paradigm. Claudien addresses characteristic induction from interpretations, a task which is related to existing formalisations of induction in logic. In characteristic induction from interpretations, the regularities(More)
Data mining techniques are becoming increasingly important in chemistry as databases become too large to examine manually. Data mining methods from the field of Inductive Logic Programming (ILP) have potential advantages for structural chemical data. In this paper we present Warmr, the first ILP data mining algorithm to be applied to chemoinformatic data.(More)