Learn More
This paper introduces mass estimation--a base modelling mechanism in data mining. It provides the theoretical basis of mass and an efficient method to estimate mass. We show that it solves problems very effectively in tasks such as information retrieval, regression and anomaly detection. The models, which use mass in these three tasks, perform at least as(More)
—Swarm-based clustering has enthused researchers for its ability to find clusters in datasets automatically, and without requiring users to specify the number of clusters. While conventional wisdom suggests that swarm intelligence contributes to this ability, recent works have provided alternative explanation about underlying stochastic heuristics that are(More)
— In this paper, we remove the ant-metaphor from ant-based clustering using a randomised partitioning method followed by an agglomerative clustering procedure. While our model only adopts part of the ant-based heuristics, it has produced results that are comparable to the ant-based model. Our approach is based on the fact that one ant can produce the same(More)
Market Basket Analysis often involves applying the de facto association rule mining method on massive sales transaction data. In this paper, we argue that association rule mining is not always the most suitable method for analysing big market-basket data. This is because the data matrix to be used for association rule mining is usually large and sparse,(More)
This paper introduces mass estimation—a base modelling mechanism that can be employed to solve various tasks in machine learning. We present the theoretical basis of mass and efficient methods to estimate mass. We show that mass estimation solves problems effectively in tasks such as information retrieval, regression and anomaly detection. The models, which(More)
— One common approach in swarm-based clustering is to use agents to create a set of clusters on a two-dimensional grid, and then use an existing clustering method to retrieve the clusters on the grid. The second step, which we call grid-cluster retrieval, is an essential step to obtain an explicit partitioning of data. In this study, we highlight the issues(More)