MSDD as a Tool for Classi cation

Abstract

The Multi-Stream Dependency Detection algorithm has been applied to a variety of classi cation problems from the UC Irvine repository to assess performance and operating characteristics on \real world", rather than arti cial, data sets. Although MSDD was not designed to be a classi er, its performance on a few initial problems prompted further exploration. In this memo I describe the performance of MSDD on the various problems that have been tested to date, and compare that performance to other results published in the machine learning literature. The majority of the problems discussed herein were chosen from a list of thirteen presented in [3] as being a minimal representative set that covers several important features that distinguish problem domains. The MSDD algorithm was used with non-redundant child generation, left-to-right instantiation of wildcards, and the S2 heuristic in all cases. Real valued features in the data sets were binned into between ve and twenty equally sized bins. The size was chosen after brie y experimenting to see where the best accuracy was obtained. Unless otherwise noted, each data set was randomly distributed into a set containing 2=3 of the instances for training and another set containing the remaining 1=3 for testing. MSDD was run on ten di erent random splits resulting in a mean classi cation accuracy.

Cite this paper

@inproceedings{Oates1994MSDDAA, title={MSDD as a Tool for Classi cation}, author={Tim Oates}, year={1994} }