Learn More
Clustering is an important task in mining evolving data streams. Beside the limited memory and one-pass constraints , the nature of evolving data streams implies the following requirements for stream clustering: no assumption on the number of clusters, discovery of clusters with arbitrary shape and ability to handle outliers. While a lot of clustering(More)
Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement. Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters but also the evolving behaviors of individual clusters. In this paper, we present a novel method for(More)
Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuous arriving large volume of data, because of their request for linear(More)
The spread and resonance of users' opinions on SinaWeibo, the most popular micro-blogging website in China, are tremendously influential, having significantly affected the processes of many real-world hot social events. We select 21 hot events that were widely discussed on SinaWeibo in 2011, and do some statistical analyses. Our main findings are that (i)(More)
It is challenge to maintain frequent items over a data stream, with a small bounded memory, in a dynamic environment where both insertion/deletion of items are allowed. In this paper, we propose a new novel algorithm, called <b>hCount</b>, which can handle both insertion and deletion of items with a much less memory space than the best reported algorithm.(More)
The filtering of XML data is the basis of many complex applications. Lots of algorithms have been proposed to One important challenge is that the number of path queries is huge. It is necessary to take an efficient data structure representing path queries. Another challenge is that these path queries usually vary with time. The maintenance of path queries(More)
Outlier detection techniques are widely used in many applications such as credit card fraud detection, monitoring criminal activities in electronic commerce, etc. These applications attempt to identify outliers as noises, exceptions, or objects around the border. The existing density-based local outlier detection assigns the degree to an object of being an(More)
As a widely used data mining technique, outlier detection is a process which aims to find anomalies with good explanations. Most existing methods are designed for numeric data. However, they will meet problems in real-life applications, which always contain categorical data. In this paper, we introduce a novel outlier mining method based on hy-pergraph(More)