Learn More
Clustering is an important task in mining evolving data streams. Beside the limited memory and one-pass constraints, the nature of evolving data streams implies the following requirements for stream clustering: no assumption on the number of clusters, discovery of clusters with arbitrary shape and ability to handle outliers. While a lot of clustering(More)
It is challenge to maintain frequent items over a data stream, with a small bounded memory, in a dynamic environment where both insertion/deletion of items are allowed. In this paper, we propose a new novel algorithm, called <b>hCount</b>, which can handle both insertion and deletion of items with a much less memory space than the best reported algorithm.(More)
Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuous arriving large volume of data, because of their request for linear(More)
Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement. Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters but also the evolving behaviors of individual clusters. In this paper, we present a novel method for(More)
With the development of positioning technologies and the boosting deployment of inexpensive location-aware sensors, large volumes of trajectory data have emerged. However, efficient and scalable query processing over trajectory data remains a big challenge. We explore a new approach to this target in this paper, presenting a new framework for query(More)
The filtering of XML data is the basis of many complex applications. Lots of algorithms have been proposed to solve this problem. One important challenge is that the number of path queries is huge. It is necessary to take an efficient data structure representing path queries. Another challenge is that these path queries usually vary with time. The(More)
As a widely used data mining technique, outlier detection is a process which aims to find anomalies with good explanations. Most existing methods are designed for numeric data. However, they will meet problems in real-life applications, which always contain categorical data. In this paper, we introduce a novel outlier mining method based on hypergraph model(More)
As one of the most important technology for implementing large-scale distributed systems, peer-to-peer (P2P) computing has attracted much attention in both research and academia communities, for its advantages such as high availability, high performance, and high flexibility to the dynamics of networks. However, multidimensional data indexing remains as a(More)