Learn More
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended(More)
The application of algorithms for eeciently generating association rules is so far restricted to cases where information is put together in a single relation. We describe how this restriction can be overcome through the combination of the available algorithms with standard techniques from the eld of inductive logic programming. We present the system Warmr,(More)
The clausal discovery engine claudien is presented. CLAUDIEN is an inductive logic programming engine that fits in the descriptive data mining paradigm. CLAUDIEN addresses characteristic induction from interpretations, a task which is related to existing formalisations of induction in logic. In characteristic induction from interpretations, the regularities(More)
The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI’97 as a research challenge for artificial intelligence. Our approach to the problem is(More)
Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the eÆciency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described(More)
We present the learning system Maccent which addresses the novel task of stochastic MAximum ENTropy modeling with Clausal Constraints. Maximum Entropy method is a Bayesian method based on the principle that the target stochastic model should be as uniform as possible, subject to known constraints. Maccent incorporates clausal constraints that are based on(More)
The generic task of Inductive Logic Programming (ILP) is to search a predeened subspace of rst-order logic for hypotheses that in some respect explain examples and background knowledge. In this paper we consider the development of parallel implementations of ILP systems. A rst part discusses the division of the ILP-task into subtasks that can be handled(More)
Cervical neoplasia-specific biomarkers, e.g. DNA methylation markers, with high sensitivity and specificity are urgently needed to improve current population-based screening on (pre)malignant cervical neoplasia. We aimed to identify new cervical neoplasia-specific DNA methylation markers and to design and validate a methylation marker panel for triage of(More)
MOTIVATION Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome(More)