A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multipartition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches.Wehave theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.

DOI: 10.1007/978-3-642-01307-2_34

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@inproceedings{Masud2009AMM, title={A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams}, author={Mohammad M. Masud and Jing Gao and Latifur Khan and Jiawei Han and Bhavani M. Thuraisingham}, booktitle={PAKDD}, year={2009} }