A Grid-Based Clustering Algorithm for High-Dimensional Data Streams

@inproceedings{Lu2005AGC,
  title={A Grid-Based Clustering Algorithm for High-Dimensional Data Streams},
  author={Yansheng Lu and Yufen Sun and Guiping Xu and Gang Liu},
  booktitle={ADMA},
  year={2005}
}
The three main requirements for clustering data streams on-line are one pass over the data, high processing speed, and consuming a small amount of memory. We propose an algorithm that can fulfill these requirements by introducing an incremental grid data structure to summarize the data streams online. In order to deal with high-dimensional problems, the algorithm adopts a simple heuristic method to select a subset of dimensions on which all the operations for clustering are performed. Our… CONTINUE READING