# Substructure Discovery Using Minimum Description Length and Background Knowledge

@article{Cook1994SubstructureDU, title={Substructure Discovery Using Minimum Description Length and Background Knowledge}, author={Diane Joyce Cook and Lawrence B. Holder}, journal={ArXiv}, year={1994}, volume={cs.AI/9402102} }

The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical…

## 560 Citations

Substucture Discovery in the SUBDUE System

- Computer ScienceKDD Workshop
- 1994

The SUBDUE system, which uses the minimum description length (MDL) principle to discover substructures that compress the database and represent structural concepts in the data, is described.

An Emprirical Study of Domain Knowledge and Its Benefits to Substructure Discovery

- Computer ScienceIEEE Trans. Knowl. Data Eng.
- 1997

Results show that domain specific knowledge improves the search for substructures that are useful to the domain and leads to greater compression of the data.

Subdue: compression-based frequent pattern discovery in graph data

- Computer Science
- 2005

The graph-based data mining system Subdue is described which focuses on the discovery of sub-graphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm.

Finding the most descriptive substructures in graphs with discrete and numeric labels

- Computer ScienceJournal of Intelligent Information Systems
- 2013

This paper explores the relationship between graph structure and the distribution of attribute values and proposes an outlier-detection step, which is used as a constraint during substructure discovery and applies to multi-dimensional numeric attributes.

Approaches to Parallel Graph-Based Knowledge Discovery

- Computer ScienceJ. Parallel Distributed Comput.
- 2001

This research investigates approaches for scaling a particular knowledge discovery?data mining system, Subdue, using parallel and distributed resources, and potential achievements and obstacles are discussed.

Structural Pattern Recognition in Graphs

- Computer Science
- 2003

This chapter describes an approach to discovering patterns in relational data represented as a graph based on the minimum description length (MDL) principle, which measures how well various patterns compress the original database.

Discovering Substructures in the Chemical Toxicity Domain

- Computer Science
- 1999

The researcher’s ability to interpret the data and discover interesting patterns within the data is of great importance as it helps in obtaining relevant SARs and identifying conceptually interesting substructures that enhance the interpretation of data.

Structure Discovery from Sequential Data

- Computer ScienceFLAIRS Conference
- 2004

I-Subdue is described, an extension to the Subdue graph-based data mining system that operates over sequentially received relational data to incrementally discover the most representative substructures to overcome the challenge of locally optimal substructure overshadowing those that are globally optimal.

Coupling Two Complementary Knowledge Discovery Systems

- Computer ScienceFLAIRS Conference
- 1998

This work investigates a simpler integration of the two systems by coupling the two approaches by first executing the structural discovery s}~tem on the data, and then uses these results to augment or compress the data before being input to the attribute-value-based system.

Exploiting parallelism in knowledge discovery systems to improve scalability

- Computer ScienceProceedings of the Thirty-First Hawaii International Conference on System Sciences
- 1998

This research outlines a general approach for scaling KDD systems using parallel and distributed resources and applies the suggested strategies to the SUBDUE knowledge discovery system.

## References

SHOWING 1-10 OF 31 REFERENCES

Discovery of Inexact Concepts from Structural Data

- Computer ScienceIEEE Trans. Knowl. Data Eng.
- 1993

An implementation of the authors' SUBDUE system that employs an inexact graph match to discover substructures which occur often in the data, but not always in the same form, is described.

A Minimal Encoding Approach to Feature Discovery

- Computer ScienceAAAI
- 1991

This paper discusses unsupervised learning of orthogonal concepts on relational data, which demands a much larger search space than traditional concept learning algorithms, and requires that the concepts be interpretable by a human, an ability not yet realized with connectionist algorithms.

Unifying Learning Methods by Colored Digraphs

- Computer ScienceALT
- 1993

A graph-based induction algorithm that extracts typical patterns from colored digraphs that enables the uniform treatment of the above two learning tasks to solve complex learning problems such as the construction of hierarchical knowledge bases.

Learning Engineering Models with the Minimum Description Length Principle

- Computer Science, EngineeringAAAI
- 1992

The minimum description length principle, together with the KEDS algorithm, is used to guide the partitioning of the problem space and has been tested on discovering models for predicting the performance efficiencies of an internal combustion engine.

Grammatical Inference Based on Hyperedge Replacement

- Computer ScienceGraph-Grammars and Their Application to Computer Science
- 1990

The main result is a characterization of the inferred grammars as “samples-composing” meaning that each sample can be derived and each rule contributes to the generation of samples in a certain way.

A Self-Organizing Retrieval System for Graphs

- Computer ScienceAAAI
- 1984

The design of a general knowledge base for labeled graphs is presented. The design involves a partial ordering of graphs represented as subsets of nodes of a universal graph. The knowledge base's…